Monthly Archive

Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Dec 30, 2025 By Vera Chan In Datadog

Security teams need visibility across every system they defend, including cloud platforms, SaaS applications, security controls, identity providers, and custom services. But those systems all produce logs in different formats, with inconsistent field names and structures. That lack of standardization makes it harder to correlate events, write reusable detections, and investigate incidents quickly.

Read Post

Datadog

Read more about Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Optimizing Datadog at scale: Cost-efficient observability at Zendesk

Dec 26, 2025 By Anatoly Mikhaylov In Datadog

This guest blog post is authored by Anatoly Mikhaylov, a Principal Engineer at Zendesk and Datadog Ambassador, and by Nick Hefty, a Senior Engineer at Zendesk.

Read Post

Datadog

Read more about Optimizing Datadog at scale: Cost-efficient observability at Zendesk

Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

Dec 23, 2025 By Patrick Krieger In Datadog

AI innovation has accelerated faster than most organizations’ ability to monitor and manage it. The shift from experimentation to production-scale workloads has driven a new class of operational challenges: rising GPU costs, opaque model performance, and the difficulty of linking spend to business value. As AI investments grow, executives need a unified way to measure efficiency and return without slowing down innovation.

Read Post

Datadog

Read more about Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

Detect, diagnose, and resolve network issues easily with CNM Network Health

Dec 23, 2025 By Kai Cai In Datadog

In many organizations, developers, SREs, network engineers, and security teams work in specialized domains, which can make it hard to establish a shared view of network health. As a result, engineers often struggle to determine when a network problem that originates outside of their domain of expertise is the root cause of an incident. This lack of visibility slows investigations and delays remediation.

Read Post

Datadog

Read more about Detect, diagnose, and resolve network issues easily with CNM Network Health

Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

Dec 23, 2025 By Datadog In Datadog

See how Datadog turns cloud usage and performance data into actionable business insights by helping teams calculate unit economics to measure and optimize the efficiency of every service. You’ll discover how to: Datadog bridges the gap between cloud costs and business value—helping organizations get the most value out of their cloud investment.

View Video

Datadog

Read more about Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

Automate Cloud Cost Optimization Across Workloads

Dec 23, 2025 By Datadog In Datadog

See how Datadog Cloud Cost Management combines observability and cost data with actionable automation to help teams optimize spend. In this short demo, you’ll learn how to: With Datadog Cloud Cost Management, cost optimization is built into the same platform engineers use every day.

View Video

Datadog

Read more about Automate Cloud Cost Optimization Across Workloads

How microservice architectures have shaped the usage of database technologies

Dec 22, 2025 By Bowen Chen In Datadog

In the late 2000s, the big question in database design was SQL or NoSQL. While relational databases had long held their ground, document and key-value stores were emerging as serious alternatives. Many predicted a zero-sum, winner-take-all outcome. But when we look at how organizations are using database technologies today, no single tool or category has dominated the landscape.

Read Post

Datadog

Read more about How microservice architectures have shaped the usage of database technologies

Securing customer logins with breach intelligence

Dec 22, 2025 By Gaëtan Piquenot In Datadog

Account takeovers (ATOs) are one of the most common threats facing online platforms. Attackers buy leaked usernames and passwords on underground markets then test them at scale across websites, hoping that password reuse will give them easy access. Today, ATOs have grown so sophisticated and fast-moving that manual incident response often can’t keep pace, requiring intelligent defense systems for detecting compromised credentials and preventing misuse at scale.

Read Post

Datadog

Read more about Securing customer logins with breach intelligence

A FinOps engineer's guide to governing custom metrics

Dec 19, 2025 By Dieter Matzion In Datadog

This guest blog post is authored by Dieter Matzion, a seasoned cloud practitioner who has operated exclusively in public cloud environments since 2013, with experience at leading technology companies including Google, Netflix, Intuit, and Roku. Custom metrics play a crucial role in enabling teams to monitor their applications and businesses. The flexibility of these metrics allows engineers to measure what matters most to their domain.

Read Post

Datadog

Read more about A FinOps engineer's guide to governing custom metrics

Turning errors into product insight: How early-stage teams can connect engineering data to user impact

Dec 19, 2025 By Candace Shamieh In Datadog

Early-stage engineering teams ship fast and learn in production. While speed is a competitive advantage, it can also lead to a high volume of noisy signals, like stack traces, timeouts, and dashboards full of red. Some of those problems can affect your users and revenue, but many don’t.

Read Post

Datadog

Read more about Turning errors into product insight: How early-stage teams can connect engineering data to user impact

Day 2 with Cilium: Small configurations that keep large clusters boring

Dec 18, 2025 By Candace Shamieh In Datadog

Operating Cilium at a small scale is straightforward. You install the Helm chart, choose a routing mode, and apply a few network policies. Day 1 is about getting packets to flow. Day 2 is about keeping them boring. At Datadog, we run Cilium across hundreds of Kubernetes clusters, tens of thousands of nodes, and hundreds of thousands of pods in multiple clouds. When operating at this scale, small configuration choices stop being minor details and start becoming risk multipliers.

Read Post

Datadog

Read more about Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Dec 18, 2025 By Bowen Chen In Datadog

Continuous profiling has established itself as core observability practice, so much so that we’ve referred to it as the fourth pillar of observability. But despite the capabilities and growing adoption of continuous profiling, it can still be confusing to approach profiling as a newcomer and correctly apply it to different troubleshooting scenarios.

Read Post

Datadog

Read more about Python memory profiling: Common pitfalls and how to avoid them

Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Dec 18, 2025 By Ethan Debnath In Datadog

Setting up and scaling observability across large, distributed environments often requires platform and SRE teams to coordinate access to infrastructure hosts and switch between configuration management tools and product-specific documentation. These tasks increase setup time and create delays in establishing visibility of critical services in Datadog. As teams expand their infrastructure, they need to coordinate Datadog configuration changes in a consistent and auditable way.

Read Post

Datadog

Read more about Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

From Zero to Open Source Contributor

Dec 18, 2025 By Datadog In Datadog

Never contributed to open source and feeling intimidated? Same. Before joining Datadog, Alessandro had zero open source experience. Now he's a regular contributor to Apache Iceberg. Here's exactly how he got started. Step 1: Join the Slack community and answer user questions. Step 2: Look for "good first issue" tags in the repo. Step 3: Remember that opening bug reports and doing code reviews count as contributions too.

View Video

Datadog

Read more about From Zero to Open Source Contributor

Datadog's MCP Server connects AI agents by ingesting prompts and mapping them to Datadog resources.

Dec 18, 2025 By Datadog In Datadog

Learn more by watching the full episode our latest This Month in Datadog series.

View Video

Datadog

Read more about Datadog's MCP Server connects AI agents by ingesting prompts and mapping them to Datadog resources.

Datadog re:Invent recap 2025

Dec 17, 2025 By Datadog In Datadog

View Video

Datadog

Read more about Datadog re:Invent recap 2025

The Hidden Costs and Concerns of Iceberg Maintenance

Dec 17, 2025 By Datadog In Datadog

Everyone talks about how great Apache Iceberg is, but nobody warns you about this: without proper maintenance, your tables will bloat, queries will slow down, and your catalog will run out of memory. Here are the 4 critical operations you MUST run regularly. Expiring snapshots prevents metadata bloat (Datadog learned this the hard way with catalog memory pressure). Deleting orphan files cleans up failed writes. Compacting data files keeps streaming workloads fast. Compacting manifests optimizes query planning.

View Video

Datadog

Read more about The Hidden Costs and Concerns of Iceberg Maintenance

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Dec 17, 2025 By Datadog In Datadog

Want to make your logs easier to work with? Excluding unneeded logs from indexing reduces noise and may reduce log management costs. In this video, you’ll learn how to: See for yourself how to improve log utilization with Datadog Log Patterns and log exclusion filters. Then set up an alert to track ingestion spikes.

View Video

Datadog

Read more about Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Dec 16, 2025 By Datadog In Datadog

Training an AI foundation model on over a trillion data points sounds impossible without hitting your production systems. Here's how Datadog did it with Apache Iceberg for their time series forecasting model TOTO. The key challenge: extracting massive historical observability data (metrics spanning years) and running incremental preprocessing pipelines without overwhelming production services. Iceberg solved this by providing schema governance, consistency guarantees, and seamless integration with ML tools like Ray and PyTorch.

View Video

Datadog

Read more about Training Foundation Models on a Trillion Data Points with Apache Iceberg

Monitor your Kubernetes operators to keep applications running smoothly

Dec 15, 2025 By David Lentz In Datadog

The performance of your Kubernetes operators often influences the behavior of the applications they manage. Operators automate the day-to-day management of your applications by executing critical activities, which may include scaling replicas, performing upgrades, and recovering from failures. For example, a PostgreSQL operator can ensure that standby servers are always deployed, that the database’s failover is correctly configured, and that data is backed up on schedule.

Read Post

Datadog

Read more about Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

Dec 15, 2025 By Addie Beach In Datadog

Connecting day-to-day development work to real user outcomes can be challenging. As a result, engineers and product teams often struggle to effectively prioritize projects together. While the goal of improving user experience (UX) is the same, each team relies heavily on different—and often siloed—forms of monitoring to understand their app, creating a disconnect in metrics and visualizations that can be hard to communicate.

Read Post

Datadog

Read more about From performance to impact: Bridging frontend teams through shared context

How to Track Cloud Costs in Real-Time Instead of Waiting Days

Dec 15, 2025 By Datadog In Datadog

Tired of waiting days to see your AWS bill spike? Datadog solved this problem using Apache Iceberg to deliver real-time cloud cost visibility - updating every 15 minutes instead of waiting for billing data. Here's how it works: They sync real-time resource inventory (EC2 instances, Kubernetes pods) into Iceberg tables, then use Trino to join those snapshots with unit pricing data. The result? FinOps teams can catch cost anomalies before they become budget disasters.

View Video

Datadog

Read more about How to Track Cloud Costs in Real-Time Instead of Waiting Days

This Month in Datadog - December 2025

Dec 11, 2025 By Datadog In Datadog

For our last episode of 2025, we’re focusing on Datadog releases announced at AWS re:Invent. Join Jeremy to see how you can manage logs at petabyte scale in your infrastructure, eliminate unneeded costs in Amazon S3 buckets, build agentic workflows, and detect credential leaks. Later in the episode, Scott spotlights how you can connect your AI agents to Datadog tools and context with our MCP Server.

Read Post

Datadog

Read more about This Month in Datadog - December 2025

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Dec 11, 2025 By Andrew Krug In Datadog

After four days of AWS re:Invent—a 65,000-step marathon that included 60,000 attendees spread across five Las Vegas campuses—and navigating the latest installment of this 13-year-old cloud pilgrimage, we’re all a little dehydrated but significantly wiser. The volume of announcements felt less like a single flood and more like a river branching into three powerful currents. Making sense of this massive technological convergence requires zooming out.

Read Post

Datadog

Read more about Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Dec 10, 2025 By Datadog In Datadog

Get a closer look at features we announced at AWS re:Invent in the latest episode of This Month in Datadog. Tune in for spotlights of Bits AI SRE, now generally available, and Datadog’s MCP Server, which connects AI agents to our platform by ingesting prompts and mapping them to Datadog resources and data. Plus, we cover how to: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Dec 10, 2025 By Datadog In Datadog

Think managing a few database tables is hard? Try 50,000 production Iceberg tables storing petabytes of data with 8 million scans per day. In this clip, Datadog's platform team reveals the architecture choices behind their managed Iceberg implementation that serves hundreds of internal engineering teams.

View Video

Datadog

Read more about How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Datadog on Apache Iceberg

Dec 9, 2025 By Datadog In Datadog

Historically, Datadog has relied on technologies like Snowflake and Apache Spark on raw parquet files (lacking consistent table structure) to power internal analytics and data science at scale. As usage grew across product teams, more features depended on data science teams, and our datasets grew to include more telemetry data, these systems became complex to manage and govern both technically and financially. The need for a more flexible and scalable solution led Datadog to adopt Apache Iceberg, an open source table format for data lakes that brings reliability and performance while remaining SQL-friendly.

View Video

Datadog

Read more about Datadog on Apache Iceberg

Keep service ownership up to date with Datadog Teams' GitHub integration

Dec 9, 2025 By Roxanne Moslehi In Datadog

Engineering organizations depend on clear team ownership to maintain reliable services and move quickly. But as codebases expand and teams shift, answering basic questions—Who owns this service? Who should be paged in an incident? Are teams meeting operational standards?—becomes harder.

Read Post

Datadog

Read more about Keep service ownership up to date with Datadog Teams' GitHub integration

Bits AI SRE, our first AI agent, now generally available! #datadog

Dec 4, 2025 By Datadog In Datadog

We introduced Bits AI SRE, our first AI agent, now generally available. Across industries, customers of all sizes are already seeing faster resolution, stronger reliability, and a better on-call experience for their teams.

View Video

Datadog

Read more about Bits AI SRE, our first AI agent, now generally available! #datadog

Automate infrastructure operations with Datadog Infrastructure Management

Dec 4, 2025 By Jessie Wu In Datadog

Many organizations struggle to track how their cloud infrastructure changes over time. Modern environments span tens of thousands of resources across hundreds of accounts and multiple clouds. Application teams add new services and regions at a rapid pace, increasing the number and variety of resources that need to be managed. These shifts can cause infrastructure configurations to drift from a well-architected state, increasing the risk of service reliability issues and unexpected cloud spend.

Read Post

Datadog

Read more about Automate infrastructure operations with Datadog Infrastructure Management

Observability in the AI age: Datadog's approach

Dec 2, 2025 By Yanbing Li In Datadog

Ten years ago, Datadog was a single-product company focused on breaking down the silos between dev and ops. As the shift towards the cloud accelerated and organizations transitioned to the new DevOps model, we set out to develop an observability platform that would enable these teams to safely scale faster and answer the essential questions about their services: are they available, secure, compliant, performant, and cost-efficient?

Read Post

Datadog

Read more about Observability in the AI age: Datadog's approach

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Dec 2, 2025 By Allie Rittman In Datadog

Running Kubernetes at scale almost always means paying for more compute than you need. To protect reliability, platform and application teams typically overprovision nodes early in development and keep scaling up as they add features and workloads. They are often reluctant to move to smaller or different instance types without a clear picture of how those changes will affect performance or availability. The result is a fleet of underutilized nodes that silently inflate your cloud bill.

Read Post

Datadog

Read more about Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Dec 2, 2025 By Datadog In Datadog

Support for Oracle Cloud Infrastructure (OCI) is now live in Datadog Cloud Cost Management. In this short demo, you’ll learn how to: Get granular visibility into OCI cost and usage—by service, compartment, tag, and resource tier. Uncover savings opportunities by combining cost data with observability metrics like CPU, memory, and storage utilization. Set up anomaly monitors and budgets to avoid cost overruns—especially for high-risk workloads like AI and GPU training.

View Video

Datadog

Read more about Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Datadog Bits AI SRE: Your new teammate for on-call shifts

Dec 2, 2025 By Datadog In Datadog

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

View Video

Datadog

Read more about Datadog Bits AI SRE: Your new teammate for on-call shifts

Accelerate investigations with AI-powered log parsing

Dec 1, 2025 By Usman Khan In Datadog

When debugging production issues, investigating security incidents, or analyzing network traffic, engineers and analysts need not only to find the right logs but to make sense of all the dense, unstructured data generated by different systems. Logs rarely ship neatly laid out in a way that facilitates filtering, faceting, or graphing for every possible scenario. As a result, teams often find themselves writing regular expressions or custom parsers on the fly, which can be error-prone and time-consuming.

Read Post

Datadog

Read more about Accelerate investigations with AI-powered log parsing

Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

Dec 1, 2025 By Ali Al-Rady In Datadog

AI coding assistants are quickly becoming a core part of software engineering workflows, helping developers write, refactor, and review code faster. But without effective monitoring, it can be difficult to know whether these tools are performing reliably and proving useful to engineers. As organizations scale their use of tools like Claude Code, key questions emerge.

Read Post

Datadog

Read more about Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

Operations | Monitoring | ITSM | DevOps | Cloud

Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Optimizing Datadog at scale: Cost-efficient observability at Zendesk

Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

Detect, diagnose, and resolve network issues easily with CNM Network Health

Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

Automate Cloud Cost Optimization Across Workloads

How microservice architectures have shaped the usage of database technologies

Securing customer logins with breach intelligence

A FinOps engineer's guide to governing custom metrics

Turning errors into product insight: How early-stage teams can connect engineering data to user impact

Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

From Zero to Open Source Contributor

Datadog's MCP Server connects AI agents by ingesting prompts and mapping them to Datadog resources.

Datadog re:Invent recap 2025

The Hidden Costs and Concerns of Iceberg Maintenance

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

How to Track Cloud Costs in Real-Time Instead of Waiting Days

This Month in Datadog - December 2025

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Datadog on Apache Iceberg

Keep service ownership up to date with Datadog Teams' GitHub integration

Bits AI SRE, our first AI agent, now generally available! #datadog

Automate infrastructure operations with Datadog Infrastructure Management

Observability in the AI age: Datadog's approach

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Datadog Bits AI SRE: Your new teammate for on-call shifts

Accelerate investigations with AI-powered log parsing

Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

Monthly Archive

Follow Us