Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

Datadog on Kafka

As a company, Datadog ingests trillions of data points per day. Kafka is the messaging persistence layer underlying many of our high-traffic services. Consequently, our Kafka usage is quite high: double-digit gigabytes per second bandwidth and the need for petabytes of high performance storage, even for relatively short retention windows. In this episode, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters. You'll leave with lessons learned while scaling persistent storage on modern orchestrated infrastructure, and actionable insights you can apply at your organization

Introduction to Site Reliability Engineering

In this session, we start with the basics of SRE, including some common terminology and theory, then dive into practical examples—including lessons learned from our own journey here at Datadog. We discuss the relationship between SRE and DevOps, what success looks like (and how to measure it), and how to identify and nurture both internal and external talent in order to build a cross-functional team. SRE is a large, complex topic, so the session ends with a live Q&A and deep-dive into some great topics.

How To Monitor Containers in Real-Time with Datadog Live Containers | Datadog Tips & Tricks

In this video, you’ll learn how to utilize Datadog’s Live Container View to monitor and troubleshoot container performance underlying your applications. Datadog makes it easy to monitor ephemeral, containerized infrastructure. In this video you’ll learn how to leverage Datadog’s Live Container View to effectively dive into your container health. Using this view, you can sort and group your containers by tags or labels imported from Kubernetes, such as container name.

How to Create a Graph Using Tags and Time Aggregation | Datadog Tips & Tricks

In this video, you’ll learn how to use tag-based grouping and time aggregation (with the rollup function) to create actionable time-series graphs. Datadog offers various ways to manipulate your metric graphs so that you can create graphs that are specific and actionable for all of your use cases. Two methods of doing this—as explored in this video—are tag-based grouping and time aggregation.

How to Import Kubernetes Labels as Tags | Datadog Tips & Tricks

In this video, you’ll learn how to turn Kubernetes node labels and pod labels into tags in Datadog in order to correlate metrics, traces, and logs back to Kubernetes deployments. Using labels for Kubernetes objects—such as pods or nodes—is key to organizing and making sense of your deployments. Datadog can automatically bring your Kubernetes labels from your clusters into the Datadog platform as tags, regardless of whether you’re using on-prem Kubernetes or a cloud-based service such as AKS, EKS, or GKE.

How to Use Browser Tests to Monitor Web App User Journeys | Datadog Tips & Tricks

In part 2 of this 2 part series, you’ll learn how to create Datadog Browser Tests to replicate user journeys and verify both that your web applications are responsive and functioning properly at all times. In part 1 of this series (link), you learned how Datadog’s API tests can be used to check API and website uptime. Datadog Browser Tests take this a step further, allowing you to replicate entire user journeys and transactions through your web applications. This is done with our browser recorder: simply click “Start Recording” and click through your application to record a test.

Identifying EC2 Right Sizing Opportunities for Cost Optimization | Datadog Tips & Tricks

In this video, you’ll learn how to identify right sizing opportunities for your EC2 instances utilizing Datadog metric dashboards. Optimizing your cloud footprint for cost efficiency can be a huge task, especially for large and scaling environments. Utilizing time series data and toplists, Datadog dashboards allow you to see chronically underutilized EC2s in your AWS environment. Template variables allow you to sort EC2s by teams and instance types, so you quickly identify the scope of cost saving opportunities across your organization.