%term

The latest News and Information on Service Reliability Engineering and related technologies.

Full-Stack Observability: What It Is [Minus the Fluff]

Mar 17, 2025 By Anjali Udasi In Last9

You've heard the term thrown around in meetups and Slack channels, but what exactly is full-stack observability? Simply put, you can see, understand, and quickly act on everything happening across your entire tech stack—from frontend user interactions to backend services, cloud infrastructure, and third-party integrations. Full-stack observability isn't just another tech buzzword. It's the difference between being blindsided by outages and catching issues before your users tweet about them.

Read Post

Last9

Read more about Full-Stack Observability: What It Is [Minus the Fluff]

Distributed Tracing: An Advanced Guide for DevOps & SREs

Mar 17, 2025 By Anjali Udasi In Last9

In the microservices world, tracking down performance issues feels like solving a mystery with pieces scattered across dozens of systems. When users report slowness, your team needs answers fast—not hours of guesswork. Distributed tracing is emerged as the solution, but implementing it effectively requires more than just understanding the basics. This guide takes you beyond the fundamentals to show you how DevOps teams and SREs can build truly effective tracing strategies.

Read Post

Last9

Read more about Distributed Tracing: An Advanced Guide for DevOps & SREs

systemctl: The Complete Guide to Managing Linux Services

Mar 17, 2025 By Prathamesh Sonpatki In Last9

Ever found yourself staring at your terminal, wondering why a service won’t start? systemctl is the backbone of modern Linux service management, but if you’re new to it, it can feel overwhelming. This guide breaks it down—covering essential commands and advanced techniques in a clear, practical way. No unnecessary jargon, just the know-how you need to manage services with confidence.

Read Post

Last9

Read more about systemctl: The Complete Guide to Managing Linux Services

Syslog Servers Explained: How They Help with Logging

Mar 17, 2025 By Preeti Dewani In Last9

Your team lead just dropped, "We need to set up a syslog server," and now you're wondering what you've signed up for. Syslog servers aren’t just another checkbox in your infrastructure; they’re the quiet workhorses that keep logs organized and accessible. When things go wrong, they help you connect the dots faster. Imagine this: It’s 3 AM, and alerts are flooding in. Your authentication service is failing, but the logs on that server show nothing unusual.

Read Post

Last9

Read more about Syslog Servers Explained: How They Help with Logging

A Practical Guide to the OpenTelemetry Java Agent

Mar 13, 2025 By Prathamesh Sonpatki In Last9

Ever felt like you're missing crucial insights into your Java applications? The OpenTelemetry Java Agent changes that game completely. This comprehensive guide takes you beyond the basics, showing you not just how to implement it, but how to master it for maximum observability.

Read Post

Last9

Read more about A Practical Guide to the OpenTelemetry Java Agent

The Complete Guide to Monitoring Container CPU Usage

Mar 13, 2025 By Anjali Udasi In Last9

Have you ever opened your Kubernetes dashboard and wondered why your app seems to slow down? As containers multiply rapidly, keeping track of CPU usage becomes a must. Let’s break it down by focusing on one key metric: container_cpu_usage_seconds_total.

Read Post

Last9

Read more about The Complete Guide to Monitoring Container CPU Usage

How to Set Up Logging in Node.js (Without Overthinking It)

Mar 13, 2025 By Preeti Dewani In Last9

Logging in Node.js might not be the most exciting part of development, but it’s one of the most important. Whether you're troubleshooting bugs or keeping track of how your app is running, good logs make life easier. Let’s break down how to set up logging the right way.

Read Post

Last9

Read more about How to Set Up Logging in Node.js (Without Overthinking It)

Scientific Incident Management with Dan Slimmon

Mar 13, 2025 By Rootly In Rootly

Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response. In this episode, Dan shares his philosophy on "scientific incident response," the importance of hypothesis-driven troubleshooting, and why incidents should be seen as normal in complex systems.

View Video

Rootly

Read more about Scientific Incident Management with Dan Slimmon

Essential Prometheus Queries: Simple to Advanced

Mar 13, 2025 By Anjali Udasi In Last9

Monitoring your infrastructure doesn't have to be a headache. With Prometheus, you've got a powerful ally in your corner—but like any tool, knowing how to use it makes all the difference. Let's cut through the noise and get straight to the good stuff: practical Prometheus query examples that extract exactly the insights you need when you need them most.

Read Post

Last9

Read more about Essential Prometheus Queries: Simple to Advanced

Prometheus Port Configuration: A Detailed Guide

Mar 12, 2025 By Prathamesh Sonpatki In Last9

Setting up Prometheus should be straightforward, but when metrics stop flowing, it’s usually something simple—like a port issue. Misconfigure it, and suddenly, your whole monitoring setup feels like a guessing game. This guide breaks down how to configure Prometheus ports properly, whether you're sticking to defaults or need a custom setup.

Read Post