%term

The latest News and Information on Service Reliability Engineering and related technologies.

A Single Hub for Telemetry: OpenTelemetry Gateway

Sep 1, 2025 By Anjali Udasi In Last9

The OpenTelemetry Gateway (OTel Gateway) is a centralized service that collects, processes, and routes telemetry data—metrics, traces, and logs—across your infrastructure. In a typical setup, each service pushes telemetry directly to an observability backend. While this approach works well for small environments, it becomes increasingly difficult to manage as systems grow.

Read Post

Last9

Read more about A Single Hub for Telemetry: OpenTelemetry Gateway

How to Choose the Right Incident Management Tool for Your Team

Aug 29, 2025 By Vishal Padghan In Squadcast

IT disruptions are inevitable. What separates a resilient organization from the rest is its ability to respond quickly, efficiently, and collaboratively to incidents. The cornerstone of such responsiveness? The right incident management tool. But with a market flooded with tools, each promising to revolutionize your workflows, how do you pick the one that truly fits your team's needs? In this blog, we'll break down the key factors to consider when selecting an incident management tool, ensuring you make an informed decision that enhances your team's effectiveness and reliability.

Read Post

Squadcast

Read more about How to Choose the Right Incident Management Tool for Your Team

A Practical Guide to Python Application Performance Monitoring (APM)

Aug 29, 2025 By Anjali Udasi In Last9

When your Python app starts slowing down, maybe queries are taking longer, memory keeps creeping up, or API calls are lagging—basic server metrics won’t tell you why. You need to see what’s happening inside the application itself. That’s the role of Application Performance Monitoring (APM). It gives you a breakdown of database queries, external API calls, memory usage, error rates, and more, so you can connect the dots between code and performance.

Read Post

Last9

Read more about A Practical Guide to Python Application Performance Monitoring (APM)

What is Database Monitoring

Aug 28, 2025 By Anjali Udasi In Last9

Database monitoring transforms from a reactive troubleshooting exercise into a proactive optimization strategy when you have the right tools and approaches in place. This blog shares practical ways to choose monitoring solutions, set up observability for different database platforms, and design workflows that scale in modern distributed systems.

Read Post

Last9

Read more about What is Database Monitoring

Incident Response for DevOps, SREs, and IT Teams

Aug 25, 2025 By Sreekar In Spike

That 3 AM alert is never fun. Your heart races as you try to figure out what broke this time, and how fast you can fix it. But with an incident response in place, that panic turns into a calm, step-by-step fix. It helps you handle everything, from a server crash to a security breach, in an organized way. In this guide, I’ll walk you through what exactly an incident response is, why you need it, its key components, and how to build one.

Read Post

Spike

Read more about Incident Response for DevOps, SREs, and IT Teams

OpenTelemetry API vs SDK: Understanding the Architecture

Aug 25, 2025 By Anjali Udasi In Last9

When you're instrumenting applications with OpenTelemetry, you'll encounter two core components: the API and the SDK. The API defines what telemetry data looks like and how it is created, while the SDK handles how that data is processed and exported. Understanding this split helps you build more maintainable observability and avoid tight coupling between your business logic and telemetry infrastructure.

Read Post

Last9

Read more about OpenTelemetry API vs SDK: Understanding the Architecture

APM Logs: How to Get Started for Faster Debugging

Aug 21, 2025 By Anjali Udasi In Last9

When application performance monitoring detects a spike in latency or error rates, the immediate challenge is determining the underlying cause. APM logs address this by correlating performance metrics with the specific log events that occurred at the same time. Instead of switching between monitoring dashboards and manually searching through log files, APM log correlation consolidates both views.

Read Post

Last9

Read more about APM Logs: How to Get Started for Faster Debugging

A Detailed Guide to Azure Kubernetes Service Monitoring

Aug 20, 2025 By Faiz Shaikh In Last9

Azure Kubernetes Service (AKS) continuously generates a high volume of telemetry, ranging from node-level CPU and memory usage to request latencies and error rates within individual pods and services. Without a structured monitoring strategy, this flood of metrics can easily become noise, leaving teams blind to early warning signs. Effective monitoring in AKS is about identifying the right signals, correlating them across layers, and acting before they impact application performance or cluster stability.

Read Post

Last9

Read more about A Detailed Guide to Azure Kubernetes Service Monitoring

Your Apps Are Green. Your Infrastructure Is Dying.

Aug 20, 2025 By Nishant Modak In Last9

Launch Week Day 3: Introducing Discover Infrastructure Your dashboard looks perfect. APIs responding in 80ms, background jobs processing smoothly, error rates at 0.02%. Everything's green. Then production breaks. "Why is checkout so slow?" "The payment service keeps timing out!" You run kubectl get pods and discover payment-service pods restarting every 3 minutes due to OOM kills. Then you check your database host—CPU at 98% because someone forgot the new ML training job runs there too.

Read Post

Last9

Read more about Your Apps Are Green. Your Infrastructure Is Dying.

Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

Aug 20, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Stop debugging infrastructure issues across multiple dashboards. See how Last9's Discover Infrastructure monitors K8s pods and traditional hosts together—with resource analysis, pod-level debugging, and AI that correlates app problems to infrastructure root causes. One setup (K8s + host monitoring) → Complete infrastructure visibility that connects to your services and jobs. No more blind spots between application performance and underlying resources.

View Video