Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

SRE Fundamentals: Everything you need to know

Google has had an outsized impact on the world, from its unrivaled search engine to its expansion into a range of customer-focused services. It would be difficult to make an impact of this magnitude without also leading the way in the software development industry. One of its biggest contributions to the community is a set of principles known as site reliability engineering or SRE.

Setting better SLOs using Google's Golden Signals

To many engineers, the idea that you can accurately and comprehensively track your application's user experience using just a few simple metrics might sound far-fetched. Believe it or not, there are four metrics that aim to do just that. They're called the four Golden Signals and should be a core part of your observability and reliability practices.

The Blameless Complete Guide to Incident Management

Incidents are inevitable. As your service expands and becomes more complex, you are more likely to encounter outages, slowdowns, errors, and other disruptions to healthy operation. At the same time, as your service becomes more popular and relied on by users, the cost of incidents becomes higher. Studies have shown that the cost of downtime is high, and growing fast in the digital-first world. Since you can never fully prevent incidents, it's important to resolve them as efficiently as possible.

How Many SREs Does Your Company Need? Here's How to Decide

So you’ve decided to take advantage of Site Reliability Engineering by hiring SREs for your company. Now, you have a second decision to make: Exactly how many SREs to hire. Do you need just one or two SREs? Or should you build a sprawling SRE team, with a dozen or more SREs on hand to support your organization’s reliability needs? The answers to these questions will, of course, vary; every business’s needs are different.

Announcing Incident watchers: Subscribe to incidents and receive incident updates in real-time

Hey folks, We’re back with another feature update for all our customers! We have recently gone live with the incident watchers feature which nests within an incident details page. This blog will outline how you can access the feature, its primary functionalities and how we foresee it helping improve your incident management process. Note: This feature will be available to pro, premium and enterprise plan users only.

Kubernetes alternatives to Spring Java framework

Spring Cloud and Kubernetes both complement each other to build a cloud native platform and run microservices on the Kubernetes containers. Kubernetes provides many features which are similar to Spring Cloud and Spring Config Server features. Spring framework has been around for many years. Even today, many organizations prefer to go with Spring libraries because it provides many features. It's a great deal when developers have total control over cloud configuration along with business logic source code.

Introducing Squadcast Premium

For the last few years, Squadcast has been building out a market-leading on-call and alert management solution. Over the past few quarters, we have significantly enhanced our on-call product by releasing and improving features related to Incident Response - including Slack / MS Teams integration, Runbooks, Postmortems, Service Level Objectives, and Status Pages. We believe that a reliability platform involves both on-call and incident response - one cannot work effectively without the other.