Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Debug Logging: A Comprehensive Guide for Developers

When an app breaks and there's no clear clue why, debug logs often hold the answers. They record what the code was doing at each step, making it easier to trace back and spot what went wrong. This guide covers what debug logging is, why it’s useful, and how to use it without turning logs into a wall of noise.

HAProxy vs NGINX Performance: A Comprehensive Analysis

When architecting high-performance infrastructure capable of handling substantial traffic loads, the choice of load balancer is a critical decision that can significantly impact system reliability, performance, and cost-efficiency. Among the leading contenders, HAProxy and NGINX stand out as mature, battle-tested solutions with distinct strengths and characteristics.

How to Use Prometheus for APM

Monitoring applications shouldn’t be a guessing game. But too often, DevOps engineers end up buried under a pile of metrics that don’t help when things go wrong. That’s where Prometheus APM comes in. It offers a straightforward way to make sense of your systems—especially when you're working with modern, distributed setups like microservices.

Squadcast Strengthens Its Leadership in IT Alerting and Incident Management in the G2 Spring Report

2025 has already started out to be a remarkable year for Squadcast—with our key wins in the G2 Spring Reports, our acquisition by SolarWinds, and a series of impactful product releases and improvements. Our mission has always been clear: to deliver a unified platform that seamlessly integrates On-Call Management and Incident Response, empowering teams to boost service reliability and productivity—all without the burden of context switching.

Comparing ELK, Grafana, and Prometheus for Observability

Monitoring and observability are cornerstones of modern infrastructure management. Three popular solutions that often come up in this space are the ELK Stack, Grafana, and Prometheus. This comparison breaks down the key differences, use cases, and integration capabilities to help you determine which tool or combination better suits your operational needs.

Metrics That Matter: Measuring Developer Productivity in the AI Era

In this episode, Ryan McDonald is joined by Mark Quigley, Head of Platform Engineering at Ninety.io, for a conversation that cuts through the noise around developer productivity metrics and AI. Mark dives deep into how teams can measure what matters—without falling into the trap of turning every measure into a target. He shares how tools like Developer NPS, DORA metrics, and balanced scorecards can help teams optimize for both output and well-being—but only when framed with the right intent.

How to View and Understand VPC Flow Logs

If you're running workloads in AWS, you've probably heard about VPC Flow Logs. These logs are your eyes and ears for network traffic in your Virtual Private Cloud, and knowing how to check them properly can save you hours of troubleshooting headaches. Whether you're tracking down connectivity issues or monitoring for suspicious activity, this guide will walk you through checking VPC flow logs step by step, with practical examples you can apply today.

Envoy vs HAProxy: Which Proxy Server Is Right for Your Infrastructure?

Choosing between Envoy and HAProxy isn't just about picking a proxy server. It's about deciding which tool will handle your traffic, balance your loads, and keep your services running when everything else wants to crash. If you're a DevOps engineer or system admin weighing these options, you're in the right place.