Operations | Monitoring | ITSM | DevOps | Cloud

Gremlin

Setting better SLOs using Google's Golden Signals

To many engineers, the idea that you can accurately and comprehensively track your application's user experience using just a few simple metrics might sound far-fetched. Believe it or not, there are four metrics that aim to do just that. They're called the four Golden Signals and should be a core part of your observability and reliability practices.

SRE vs DevOps: Can they coexist or do they compete?

Systems fail, sometimes publicly and at great cost. Airlines have experienced system-wide ticketing outages, causing hundreds of flight cancellations and significant inconvenience to customers. Retailers have experienced website crashes on the busiest shopping days of the year, costing millions in lost revenue and customer goodwill. It is vital to understand both DevOps and SRE and the roles they play in preventing such outages.

What is a "service" in a microservices architecture?

The past ten years marked a significant change in how software teams build and deploy applications. We moved away from bulky, slow, monolithic applications toward lightweight, scalable, distributed service-based applications. Meanwhile, tools like Docker, Kubernetes, and other container platforms helped accelerate this process. Despite this sudden growth, a fundamental question remains: what exactly is a service, and how does it fit into a microservice architecture?

What are the four Golden Signals?

When it comes to building reliable and scalable software, few organizations have as much authority and expertise as Google. Their Site Reliability Engineering Handbook, first published in 2016, details their practices to maintain reliability as Google scaled. But when you have over a million servers running thousands of services across more than twenty data centers, how do you monitor them in a consistent, logical, and relevant way?

Four tests to measure and improve reliability: what matters and how it works

Legendary race car driver Carroll Smith once said, "until we have established reliability, there is no sense at all in wasting time trying to make the thing go faster." Even though he was referring to cars, the same goes for technology: no amount of code optimization or new features can replace stable systems. Unfortunately, much like race cars, it's hard to know that a system is unreliable until it blows a tire, the brakes stop working, or the steering wheel comes off the column.

How to add a Golden Signal to a service in Gremlin RM

In this video, we show you how to add a Golden Signal to a service. Gremlin uses your Golden Signals to ensure your services are still healthy and responsive during reliability tests. You can configure Golden Signals to use an existing monitor in your observability tools, such as Datadog, New Relic, or Prometheus. We recommend adding all four Golden Signals to each of your services to ensure comprehensive coverage.