Operations | Monitoring | ITSM | DevOps | Cloud

Blameless

How to Choose Monitoring Tools for DevOps and SRE

When developing for reliability or implementing resilient DevOps practices, the heart of your decision-making is data. Without carefully monitoring key metrics like uptime, network load, and resource usage, you’ll be blind to where to spend development efforts or refine operation practices. Fortunately, a wide variety of monitoring tools are available to help you collect and get visibility into this data.

Leaders, Here's how to Encourage Full Service Ownership

Service ownership is becoming common practice and its benefits are well-known. These perks include happier customers, aligned teams, and fewer incidents. While this sounds great, it’s often easier said than done, requiring a culture and mindset shift. Leadership will need to encourage and empower teams to adopt the “you build it, you run it” mentality. Here are some ways leaders can help get teams on board.

How SLOs Help Your Team with Service Ownership

Service ownership is becoming a best practice for teams looking to innovate while maintaining the level of reliability that customers expect. Service ownership means seeing the service through its entire lifecycle. In short, it means you build it, you run it. You’ll be responsible for the service’s security, reliability, performance, and quality. This doesn’t mean you won’t have help from SREs to optimize or automate toil.

Webinar: Modern Metrics to Understand Operational Health

In this webinar, you'll learn what are the SRE metrics to better gain insights into operations health. We walk through common challenges and pain points in understanding operations health, metrics to measure based on your maturity journey, and a live demo to show solutions in action.

5 Tips for Getting Alert Fatigue Under Control

What happens when you receive a notification that something is wrong with your system and you have no clue what it means, or why you’re receiving that alert? Maybe you have to parse through the alert conditions to suss out what the alert indicates, or maybe you need to ping a coworker and ask. Not knowing what to do with an alert also contributes to alert fatigue, because it increases the toil and time required to respond.

Leadership and Innovation with Instacart's VP of Infrastructure

Blameless CEO Ashar Rizqi recently had the pleasure of interviewing Dustin Pearce in a virtual executive fireside chat and AMA. Dustin is an experienced leader in scaling hyper-growth, cloud-native companies, as the VP of Infrastructure at Instacart and having previously served as Head of Service Engineering at Slack.

Promoting Continuous Learning with SRE

With the extreme changes we’ve all been through these last several months, it should come as no surprise that our jobs have changed drastically, too. We’re working remotely. We’re dealing with increased resource constraints. Our services are receiving more traffic than usual, and we’re tasked with keeping things up and running. Our work-as-done may not match what we did at the beginning of 2020.