Operations | Monitoring | ITSM | DevOps | Cloud

Introducing Runner Replicas: Scalable, Reliable Automation for Modern Ops

When you’re responsible for the reliability of complex systems, the execution layer of your automation is not something you want to think about—it should just work. Whether you’re deploying code, patching servers, or responding to an incident at 3 a.m., your automation engine should be as resilient and scalable as the infrastructure it’s operating on.

What Is RabbitMQ And How Do You Manage It With Kubernetes?

The world of Kubernetes and RabbitMQ evolves rapidly. Our popular 2022 post laid the groundwork for HA deployments; now, join us for the crucial 2025 update to ensure your architecture remains cutting-edge. As organizations continue their powerful shift from monolithic architecture (where all the code building the application exists as a single, monolithic entity) to microservices architecture.

How to Boost Revenue and Cut Network Spending with Kentik Traffic Costs

Network operators across the digital ecosystem are under pressure to cut costs while protecting revenue. This post explores three practical use cases where Kentik Traffic Costs helps turn traffic insight into commercial intelligence that helps teams negotiate smarter, protect margins, and boost profitability.

The Compliance Shortcut: Automation as the New Operating System for Resilience

For years, compliance has been synonymous with checklists, manual reporting, and time-consuming audits. That definition no longer holds. In our September 2025 webinar, Patrick Hubbard, Technical Marketing Director, led a conversation with JB Baker, Vice President of Product Engineering, and Marc Jensen, Channel Sales Engineer. Together, they showed how automation is transforming compliance into something far more strategic: the foundation of modern resilience.

Paving the way for a new era: Mezmo's Active Telemetry

The world of software development has fundamentally changed. We've moved from monthly releases to continuous delivery measured in minutes, and the rise of AI means velocity is no longer just a goal—it's a requirement for survival. But this relentless speed has exposed a critical flaw in how we approach observability. The industry relies on a "store first, ask questions later" model where you collect every log, metric, and trace, and then hope to find the root cause when something breaks.

JFrog and ServiceNow: Accelerate Trusted Software Application Development

Today’s software organizations can’t make tradeoffs between speed and trust – you need both to succeed. But juggling them is tough. Moving too fast can lead to security vulnerabilities and compliance issues, while moving too slow means your competitors beat you to market. This tension creates friction that slows down every release, a problem that is rooted in your software pipeline.

What's New in InfluxDB 3.5: Explorer Dashboards, Cache Querying, and Expanded Control

InfluxDB 3.5 is now available for both Core and Enterprise, along with updates to the new Explorer UI that make it easier to save, organize, and query your data. This release highlights the biggest updates since our 3.4 release, including Explorer Dashboards in beta, new cache querying capabilities, and stronger operational tools for managing clusters. InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2.

Ship features faster and safer with Datadog Feature Flags

Releasing new features is one of the highest-stakes moments in the software delivery life cycle. Even with CI/CD pipelines in place, plenty of things can still go wrong when a feature goes live for actual users. Most feature flagging tools operate in isolation from important observability tooling, forcing engineers to monitor changes across multiple disconnected systems to fully understand their impact. This slows down development and increases the chance of missing critical issues.

How to boost observability ROI with continuous profiling and Grafana Drilldown

For the longest time, observability was centered around logs, metrics, and traces, but the growth of more complex systems has made continuous profiling another essential part of maintaining healthy systems. It provides insights into resource usage and latency down to the code level, delivering key insights to improve performance.

The Unit Economics Of Watering My Lawn: A Lesson On Runaway AI Costs

My wife and I spent hours this summer at home digging in the dirt. We planted new shrubs and perennials and created a small vegetable garden. We spread many square yards of fresh topsoil and grass seed over areas of lawn that needed rejuvenation. It turns out, I should have done all that landscaping with a FinOps leader’s mindset — before my water bill tripled when I wasn’t looking.