%term

The latest News and Information on Service Reliability Engineering and related technologies.

Squadcast Strengthens Its Leadership in IT Alerting and Incident Management in the G2 Spring Report

Apr 9, 2025 By Sanjog Sandhu In Squadcast

2025 has already started out to be a remarkable year for Squadcast—with our key wins in the G2 Spring Reports, our acquisition by SolarWinds, and a series of impactful product releases and improvements. Our mission has always been clear: to deliver a unified platform that seamlessly integrates On-Call Management and Incident Response, empowering teams to boost service reliability and productivity—all without the burden of context switching.

Read Post

Squadcast

Read more about Squadcast Strengthens Its Leadership in IT Alerting and Incident Management in the G2 Spring Report

Metrics That Matter: Measuring Developer Productivity in the AI Era

Apr 9, 2025 By Rootly In Rootly

In this episode, Ryan McDonald is joined by Mark Quigley, Head of Platform Engineering at Ninety.io, for a conversation that cuts through the noise around developer productivity metrics and AI. Mark dives deep into how teams can measure what matters—without falling into the trap of turning every measure into a target. He shares how tools like Developer NPS, DORA metrics, and balanced scorecards can help teams optimize for both output and well-being—but only when framed with the right intent.

View Video

Rootly

Read more about Metrics That Matter: Measuring Developer Productivity in the AI Era

Comparing ELK, Grafana, and Prometheus for Observability

Apr 9, 2025 By Anjali Udasi In Last9

Monitoring and observability are cornerstones of modern infrastructure management. Three popular solutions that often come up in this space are the ELK Stack, Grafana, and Prometheus. This comparison breaks down the key differences, use cases, and integration capabilities to help you determine which tool or combination better suits your operational needs.

Read Post

Last9

Read more about Comparing ELK, Grafana, and Prometheus for Observability

FastAPI Python for Infra and Ops, Made Simple

Apr 9, 2025 By Anjali Udasi In Last9

If you're working in infrastructure or operations and looking to build reliable APIs, FastAPI might be the Python framework you need. This guide will help you understand how FastAPI can fit into your automation workflows and get you started with practical examples.

Read Post

Last9

Read more about FastAPI Python for Infra and Ops, Made Simple

Java Util Logging Configuration: A Practical Guide for DevOps & SREs

Apr 8, 2025 By Anjali Udasi In Last9

Setting up proper logging is like having a good navigation system when you're driving through unfamiliar territory. For DevOps engineers and SREs managing Java applications, understanding how to configure the built-in java.util.logging framework is essential knowledge that can save you hours of troubleshooting headaches. Let's break down java util logging configuration in a way that makes sense — no fancy jargon, we promise!

Read Post

Last9

Read more about Java Util Logging Configuration: A Practical Guide for DevOps & SREs

OpenTelemetry for Spring: Full Implementation Guide

Apr 8, 2025 By Prathamesh Sonpatki In Last9

Setting up robust observability for your Spring applications is essential for maintaining reliable, high-performing systems. This guide walks you through implementing Spring OpenTelemetry with practical advice for common challenges.

Read Post

Last9

Read more about OpenTelemetry for Spring: Full Implementation Guide

How to View and Understand VPC Flow Logs

Apr 8, 2025 By Anjali Udasi In Last9

If you're running workloads in AWS, you've probably heard about VPC Flow Logs. These logs are your eyes and ears for network traffic in your Virtual Private Cloud, and knowing how to check them properly can save you hours of troubleshooting headaches. Whether you're tracking down connectivity issues or monitoring for suspicious activity, this guide will walk you through checking VPC flow logs step by step, with practical examples you can apply today.

Read Post

Last9

Read more about How to View and Understand VPC Flow Logs

Envoy vs HAProxy: Which Proxy Server Is Right for Your Infrastructure?

Apr 8, 2025 By Faiz Shaikh In Last9

Choosing between Envoy and HAProxy isn't just about picking a proxy server. It's about deciding which tool will handle your traffic, balance your loads, and keep your services running when everything else wants to crash. If you're a DevOps engineer or system admin weighing these options, you're in the right place.

Read Post

Last9

Read more about Envoy vs HAProxy: Which Proxy Server Is Right for Your Infrastructure?

Incident management vs. problem management: A practical guide for SREs

Apr 8, 2025 By Tom Wentworth In Incident.io

In Site Reliability Engineering (SRE), distinguishing incident management from problem management is crucial. While both processes aim to maintain system reliability, they fulfill distinct roles: incident management focuses on quickly resolving immediate disruptions, whereas problem management identifies and rectifies root causes to prevent recurrence. Effectively combining these processes helps minimize downtime, enhances system resilience, and fosters a proactive operational approach.

Read Post

Incident.io

Read more about Incident management vs. problem management: A practical guide for SREs

Java GC Logs: How to Read and Debug Fast

Apr 7, 2025 By Anjali Udasi In Last9

When a Java application starts slowing down, garbage collection is often a good place to look. For engineers responsible for keeping systems stable and responsive, understanding GC logs can make a real difference. This guide walks through the basics—what to look for, what the logs mean, and how to troubleshoot common issues—so you can get ahead of problems before they impact performance.

Read Post