Latest Posts

Day 2 with Cilium: Small configurations that keep large clusters boring

Dec 18, 2025 By Candace Shamieh In Datadog

Operating Cilium at a small scale is straightforward. You install the Helm chart, choose a routing mode, and apply a few network policies. Day 1 is about getting packets to flow. Day 2 is about keeping them boring. At Datadog, we run Cilium across hundreds of Kubernetes clusters, tens of thousands of nodes, and hundreds of thousands of pods in multiple clouds. When operating at this scale, small configuration choices stop being minor details and start becoming risk multipliers.

Read Post

Datadog

Read more about Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Dec 18, 2025 By Bowen Chen In Datadog

Continuous profiling has established itself as core observability practice, so much so that we’ve referred to it as the fourth pillar of observability. But despite the capabilities and growing adoption of continuous profiling, it can still be confusing to approach profiling as a newcomer and correctly apply it to different troubleshooting scenarios.

Read Post

Datadog

Read more about Python memory profiling: Common pitfalls and how to avoid them

Monitor your Kubernetes operators to keep applications running smoothly

Dec 15, 2025 By David Lentz In Datadog

The performance of your Kubernetes operators often influences the behavior of the applications they manage. Operators automate the day-to-day management of your applications by executing critical activities, which may include scaling replicas, performing upgrades, and recovering from failures. For example, a PostgreSQL operator can ensure that standby servers are always deployed, that the database’s failover is correctly configured, and that data is backed up on schedule.

Read Post

Datadog

Read more about Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

Dec 15, 2025 By Addie Beach In Datadog

Connecting day-to-day development work to real user outcomes can be challenging. As a result, engineers and product teams often struggle to effectively prioritize projects together. While the goal of improving user experience (UX) is the same, each team relies heavily on different—and often siloed—forms of monitoring to understand their app, creating a disconnect in metrics and visualizations that can be hard to communicate.

Read Post

Datadog

Read more about From performance to impact: Bridging frontend teams through shared context

This Month in Datadog - December 2025

Dec 11, 2025 By Datadog In Datadog

For our last episode of 2025, we’re focusing on Datadog releases announced at AWS re:Invent. Join Jeremy to see how you can manage logs at petabyte scale in your infrastructure, eliminate unneeded costs in Amazon S3 buckets, build agentic workflows, and detect credential leaks. Later in the episode, Scott spotlights how you can connect your AI agents to Datadog tools and context with our MCP Server.

Read Post

Datadog

Read more about This Month in Datadog - December 2025

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Dec 11, 2025 By Andrew Krug In Datadog

After four days of AWS re:Invent—a 65,000-step marathon that included 60,000 attendees spread across five Las Vegas campuses—and navigating the latest installment of this 13-year-old cloud pilgrimage, we’re all a little dehydrated but significantly wiser. The volume of announcements felt less like a single flood and more like a river branching into three powerful currents. Making sense of this massive technological convergence requires zooming out.

Read Post

Datadog

Read more about Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Keep service ownership up to date with Datadog Teams' GitHub integration

Dec 9, 2025 By Roxanne Moslehi In Datadog

Engineering organizations depend on clear team ownership to maintain reliable services and move quickly. But as codebases expand and teams shift, answering basic questions—Who owns this service? Who should be paged in an incident? Are teams meeting operational standards?—becomes harder.

Read Post

Datadog

Read more about Keep service ownership up to date with Datadog Teams' GitHub integration

Automate infrastructure operations with Datadog Infrastructure Management

Dec 4, 2025 By Jessie Wu In Datadog

Many organizations struggle to track how their cloud infrastructure changes over time. Modern environments span tens of thousands of resources across hundreds of accounts and multiple clouds. Application teams add new services and regions at a rapid pace, increasing the number and variety of resources that need to be managed. These shifts can cause infrastructure configurations to drift from a well-architected state, increasing the risk of service reliability issues and unexpected cloud spend.

Read Post

Datadog

Read more about Automate infrastructure operations with Datadog Infrastructure Management

Observability in the AI age: Datadog's approach

Dec 2, 2025 By Yanbing Li In Datadog

Ten years ago, Datadog was a single-product company focused on breaking down the silos between dev and ops. As the shift towards the cloud accelerated and organizations transitioned to the new DevOps model, we set out to develop an observability platform that would enable these teams to safely scale faster and answer the essential questions about their services: are they available, secure, compliant, performant, and cost-efficient?

Read Post

Datadog

Read more about Observability in the AI age: Datadog's approach

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Dec 2, 2025 By Allie Rittman In Datadog

Running Kubernetes at scale almost always means paying for more compute than you need. To protect reliability, platform and application teams typically overprovision nodes early in development and keep scaling up as they add features and workloads. They are often reluctant to move to smaller or different instance types without a clear picture of how those changes will affect performance or availability. The result is a fleet of underutilized nodes that silently inflate your cloud bill.

Read Post

Datadog

Read more about Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Operations | Monitoring | ITSM | DevOps | Cloud

Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

This Month in Datadog - December 2025

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Keep service ownership up to date with Datadog Teams' GitHub integration

Automate infrastructure operations with Datadog Infrastructure Management

Observability in the AI age: Datadog's approach

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Monthly Archive

Follow Us