Operations | Monitoring | ITSM | DevOps | Cloud

Day 2 with Cilium: Small configurations that keep large clusters boring

Operating Cilium at a small scale is straightforward. You install the Helm chart, choose a routing mode, and apply a few network policies. Day 1 is about getting packets to flow. Day 2 is about keeping them boring. At Datadog, we run Cilium across hundreds of Kubernetes clusters, tens of thousands of nodes, and hundreds of thousands of pods in multiple clouds. When operating at this scale, small configuration choices stop being minor details and start becoming risk multipliers.

Text-to-Alert: Generating Netdata Alerts from Natural Language

Netdata has an incredibly powerful alerting engine. But this can sometimes be a double-edged sword: the flexibility to build incredibly specific, intelligent alerts is immense, but mastering its syntax can feel like learning a new language. We’ve heard this from so many of you. You tell us that configuring alerts is often the steepest part of the learning curve, a task that falls to the one “Netdata expert” on the team who has spent the time digging through the documentation.

A Year in Internet Analysis: 2025

This year-end wrap-up covers topics from BGP security (including ASPA and excessive AS-SETs) and the geopolitical (Ukraine’s IPv4 exodus, the Iran internet shutdown, and Red Sea cable cuts) to the year’s most significant outages (TikTok, the Spain/Portugal blackout, and cloud failures at AWS, Azure, and Cloudflare). Plus, we explore Starlink’s new Community Gateways, and revisit the evolving landscape of AS ranking and OTT service tracking.

Debug Faster with Chrome + Rollbar Debugging Assistant

Context switching is one of the biggest hidden productivity killers in debugging. Jumping between multiple open browser tabs slows momentum and increases cognitive load, especially when you’re trying to diagnose an issue under pressure. Google Chrome's new split screen feature, paired with Rollbar Debugging Assistant, enables a faster, more focused way to troubleshoot errors without constantly losing your place.

Rovo Dev Auto Closing Vulnerabilities | Bitbucket Blitz | Atlassian

Learn how Atlassian uses Rovo Dev to automatically find and fix code vulnerabilities with Rovo Dev and Bitbucket. This capability saves our developers thousands of hours over three months and reduces issue resolution time by half, allowing them to focus on building software and solving problems for our customers. This technology is available to all of our customers. Learn how it works, and start using it yourself.

The Observability Stack is Collapsing: Why Context-First Data is the Only Path to AI-Powered Root Cause Analysis

By Bill Balnave, VP of Customer Success at Mezmo The core promise of modern observability is simple: cut Mean Time To Resolution (MTTR). Yet, despite a boom in tooling and investment over the last four years, the data tells a sobering story: our industry is actually getting worse at finding and resolving issues. Dashboards, once our trusted guide, have become the starting point for a chaotic "dashboard hunt" that rarely leads to the definitive root cause.

Transforming Symfony monolith to multi-apps: a step-by-step guide

This blog post is based on Florent Huck, Developer Advocate at Upsun, at SymfonyCon 2023. We utilized AI tools for transcription and to enhance the structure and clarity of the content. The journey from a single monolithic application to a multi-application architecture doesn't have to be daunting. At a recent developer conference, Florent from Upsun's Developer Relations team shared a practical step-by-step guide on how to refactor a monolith into multiple applications using Upsun.

How Istio Ambient Mode Delivers Real World Solutions

For years, platform teams have known what a service mesh can provide: strong workload identity, authorization, mutual TLS authentication and encryption, fine-grained traffic control, and deep observability across distributed systems. In theory, Istio checked all the boxes. In practice though, many teams hit a wall. Across industries like financial services, media, retail, and SaaS, organizations told a similar story. They wanted mTLS between services to meet regulatory or security requirements.

Site24x7's Kubernetes monitoring | Proactive, scalable, AI-powered

Kubernetes drives modern cloud-native applications, but its distributed nature creates visibility and performance challenges at scale. In this video, discover how Site24x7 provides real-time monitoring, AI-powered anomaly detection, and scalability for Kubernetes environments, helping you to proactively manage resources and resolve issues faster. Key features of Site24x7 Kubernetes Monitoring: Whether you're running a single Kubernetes cluster or managing multiple environments, Site24x7 helps you ensure peak performance and faster decision-making with minimal manual intervention.