Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

5 Reliability Insights That Immediately Transform Your SRE

As infrastructure engineers, there’s so much you can learn from studying past incidents. Luckily, Blameless Reliability Insights helps you find patterns that better equip you to deal with incidents to come. If you’ve never used it before and you’re curious what it looks like, you can watch a video demo here! These statistical insights give you the power to learn everything you can when something goes wrong. ‍

Getting AWS CloudTrail alerts via SNS Endpoint

Logging and auditing have been an essential part of troubleshooting application and infrastructure performance. You can instantly spot areas of risk to ensure quick correction and prevention of issues. In this blog, we will explore the AWS CloudTrail service and discuss how integrating it with Squadcast can help you route alerts to the right users for quick and efficient incident response. Let's get started.
Sponsored Post

Simplifying SLO and Error Budget tracking for SRE teams

Service level objectives (SLOs), and the subsequent service level indicators (SLIs) are the foundation to establishing a strong SRE culture and how they promote accountability, trust and timely innovation. We are on a mission to simplify SLO and Error Budget tracking and with that aim in mind, we have added the SLO Tracker feature to the Squadcast platform. SLO Tracker seeks to provide a simple and effective way to keep track of your error budget burn rate without the hassle of configuring and aggregating multiple data sources.

5 Tips If You're the 1st SRE Hire by Instacart's First SRE

Site Reliability Engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first SRE to join an organization – as many SREs are these days, given that the SRE trend is trickling down into smaller and smaller companies – you face a special group of challenges. You may find it difficult to get buy-in for SRE from other technical teams.

Error Budgets: Ultimate SRE Guide For Teams

Any engineered system does not guarantee 100% uptime. There are bound to be some unforeseen system failures that cause downtime for the customers or create a poor customer experience. It is, therefore, best practice to take into account a margin for plausible failures. An error budget is this margin of error that the customer is informed about beforehand to secure tolerance during system failure for a decided number of hours.

10 Reasons You Need A Service Level Agreement & Why It's important

A Service Level Agreement (SLA) consists of many service commitments. It is an essential part of a contract to outsource software development or software support between two or more parties, specifying the duties and the quality and type of service a company would provide for a fee to a customer.