Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Introducing our AWS 1-click integration

Datadog’s AWS integration brings you deep visibility into key AWS services like EC2 and Lambda. We’re excited to announce that we’ve simplified the process for installing the AWS integration. If you’re not already monitoring AWS with Datadog, or if you need to monitor additional AWS accounts, our 1-click integration lets you get started in minutes.

Using Log Patterns to Create Log Exclusion Filters | Datadog Tips & Tricks

In part 2 of this 2 part series, you’ll learn how to use Log Patterns to quickly create log exclusion filters and reduce the number of low-value logs you are indexing. Datadog’s Logging with Limits™ feature allows you to selectively determine which logs to index after ingesting all of your logs. Meanwhile, the Log Patterns feature can quickly isolate groups of low-value logs.

How to Generate Metrics from Logs | Datadog Tips & Tricks

In this video, you’ll learn how to generate metrics using log events attributes to filter your logs more effectively and begin monitoring, graphing and alerting on the new metric immediately. Generating metrics from logs is a powerful tool for monitoring attributes which are parsed from your logs.

Datadog on Kubernetes

When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing. Creating and managing dozens of clusters, with thousands of nodes each and operating in several clouds was a challenging but rewarding learning experience. In this episode Ara Pulido, Developer Advocate, will chat with Laurent Bernaille, Staff Engineer at Datadog and part of the team that created Datadog’s Kubernetes platform. We’ll cover the challenges we found creating and scaling Datadog’s Kubernetes platform and how we overcame them.

Datadog on Kafka

As a company, Datadog ingests trillions of data points per day. Kafka is the messaging persistence layer underlying many of our high-traffic services. Consequently, our Kafka usage is quite high: double-digit gigabytes per second bandwidth and the need for petabytes of high performance storage, even for relatively short retention windows. In this episode, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters. You'll leave with lessons learned while scaling persistent storage on modern orchestrated infrastructure, and actionable insights you can apply at your organization

Best practices for monitoring GCP audit logs

Google Cloud Platform (GCP) is a suite of cloud computing services for deploying, managing, and monitoring applications. A critical part of deploying reliable applications is securing your infrastructure. Google Cloud Audit Logs record the who, where, and when for activity within your environment, providing a breadcrumb trail that administrators can use to monitor access and detect potential threats across your resources (e.g., storage buckets, databases, service accounts, virtual machines).

Enhanced Azure monitoring with Datadog

Microsoft Azure is a cloud computing platform for building, deploying, and managing global-scale applications. With a wide range of offerings, including dozens of different services, Azure provides tools for users to create large and sophisticated systems for hosting any type of workload. But with the huge number of configuration options and resource types, understanding the health and performance of your applications in Azure can be challenging.

Introduction to Site Reliability Engineering

In this session, we start with the basics of SRE, including some common terminology and theory, then dive into practical examples—including lessons learned from our own journey here at Datadog. We discuss the relationship between SRE and DevOps, what success looks like (and how to measure it), and how to identify and nurture both internal and external talent in order to build a cross-functional team. SRE is a large, complex topic, so the session ends with a live Q&A and deep-dive into some great topics.

How to Create a Graph Using Tags and Time Aggregation | Datadog Tips & Tricks

In this video, you’ll learn how to use tag-based grouping and time aggregation (with the rollup function) to create actionable time-series graphs. Datadog offers various ways to manipulate your metric graphs so that you can create graphs that are specific and actionable for all of your use cases. Two methods of doing this—as explored in this video—are tag-based grouping and time aggregation.

How To Monitor Containers in Real-Time with Datadog Live Containers | Datadog Tips & Tricks

In this video, you’ll learn how to utilize Datadog’s Live Container View to monitor and troubleshoot container performance underlying your applications. Datadog makes it easy to monitor ephemeral, containerized infrastructure. In this video you’ll learn how to leverage Datadog’s Live Container View to effectively dive into your container health. Using this view, you can sort and group your containers by tags or labels imported from Kubernetes, such as container name.