New York City, NY, USA
Jun 2, 2020   |  By David M. Lentz
Datadog’s AWS integration brings you deep visibility into key AWS services like EC2 and Lambda. We’re excited to announce that we’ve simplified the process for installing the AWS integration. If you’re not already monitoring AWS with Datadog, or if you need to monitor additional AWS accounts, our 1-click integration lets you get started in minutes.
May 29, 2020   |  By Justin Massey
Google Cloud Platform (GCP) is a suite of cloud computing services for deploying, managing, and monitoring applications. A critical part of deploying reliable applications is securing your infrastructure. Google Cloud Audit Logs record the who, where, and when for activity within your environment, providing a breadcrumb trail that administrators can use to monitor access and detect potential threats across your resources (e.g., storage buckets, databases, service accounts, virtual machines).
May 27, 2020   |  By Steve Harrington
Microsoft Azure is a cloud computing platform for building, deploying, and managing global-scale applications. With a wide range of offerings, including dozens of different services, Azure provides tools for users to create large and sophisticated systems for hosting any type of workload. But with the huge number of configuration options and resource types, understanding the health and performance of your applications in Azure can be challenging.
May 11, 2020   |  By Stephanie Niu
Datadog dashboards provide immediate visibility and insight into your environments. Setting template variables enables you to filter your dashboard graphs on the fly to visualize specific sets of tagged objects. Now, with saved views, you can save sets of frequently used template variables in order to easily find the data you most care about with just a few clicks.
May 8, 2020   |  By Pierre Cariou
Logs are an invaluable source of information, as they provide insights into the severity and possible root causes of problems in your system. But it can be hard to get the right level of visibility from your logs while keeping costs to a minimum. Systems that process large volumes of logs consume more resources and therefore make up a higher percentage of your overall monitoring budget. Further, log throughput can be highly variable, creating unexpected resource usage and financial costs.
Oct 29, 2018   |  By Datadog
As Docker adoption continues to rise, many organizations have turned to orchestration platforms like ECS and Kubernetes to manage large numbers of ephemeral containers. Thousands of companies use Datadog to monitor millions of containers, which enables us to identify trends in real-world orchestration usage. We’re excited to share 8 key findings of our research.
Oct 29, 2018   |  By Datadog
The elasticity and nearly infinite scalability of the cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived VMs or containers. This has elevated the need for new methods and new tools for monitoring. In this eBook, we outline an effective framework for monitoring modern infrastructure and applications, however large or dynamic they may be.
Oct 1, 2018   |  By Datadog
Where does Docker adoption currently stand and how has it changed? With thousands of companies using Datadog to track their infrastructure, we can see software trends emerging in real time. We’re excited to share what we can see about true Docker adoption.
Oct 1, 2018   |  By Datadog
Build an effective framework for monitoring AWS infrastructure and applications, however large or dynamic they may be. The elasticity and nearly infinite scalability of the AWS cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived components. This has elevated the need for new methods and new tools for monitoring.
Sep 1, 2018   |  By Datadog
Like a car, Elasticsearch was designed to allow you to get up and running quickly, without having to understand all of its inner workings. However, it’s only a matter of time before you run into engine trouble here or there. This guide explains how to address five common Elasticsearch challenges.
Jun 2, 2020   |  By Datadog
In part 2 of this 2 part series, you’ll learn how to use Log Patterns to quickly create log exclusion filters and reduce the number of low-value logs you are indexing. Datadog’s Logging with Limits™ feature allows you to selectively determine which logs to index after ingesting all of your logs. Meanwhile, the Log Patterns feature can quickly isolate groups of low-value logs.
Jun 2, 2020   |  By Datadog
In this video, you’ll learn how to generate metrics using log events attributes to filter your logs more effectively and begin monitoring, graphing and alerting on the new metric immediately. Generating metrics from logs is a powerful tool for monitoring attributes which are parsed from your logs.
Jun 2, 2020   |  By Datadog
When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing. Creating and managing dozens of clusters, with thousands of nodes each and operating in several clouds was a challenging but rewarding learning experience. In this episode Ara Pulido, Developer Advocate, will chat with Laurent Bernaille, Staff Engineer at Datadog and part of the team that created Datadog’s Kubernetes platform. We’ll cover the challenges we found creating and scaling Datadog’s Kubernetes platform and how we overcame them.
Jun 1, 2020   |  By Datadog
As a company, Datadog ingests trillions of data points per day. Kafka is the messaging persistence layer underlying many of our high-traffic services. Consequently, our Kafka usage is quite high: double-digit gigabytes per second bandwidth and the need for petabytes of high performance storage, even for relatively short retention windows. In this episode, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters. You'll leave with lessons learned while scaling persistent storage on modern orchestrated infrastructure, and actionable insights you can apply at your organization
May 26, 2020   |  By Datadog
In this session, we start with the basics of SRE, including some common terminology and theory, then dive into practical examples—including lessons learned from our own journey here at Datadog. We discuss the relationship between SRE and DevOps, what success looks like (and how to measure it), and how to identify and nurture both internal and external talent in order to build a cross-functional team. SRE is a large, complex topic, so the session ends with a live Q&A and deep-dive into some great topics.