Operations | Monitoring | ITSM | DevOps | Cloud

Deploying OpenTelemetry Organizationally: From Proof of Concept to In-Production at Scale

Observability involves telling a coherent story about an entire system. Over the years, video streaming service Pluto TV has had to navigate many storytellers in terms of observability vendors, tools, and formats before settling on OpenTelemetry to analyze and compare features across its many destination platforms. During this presentation, you'll see how Bharathi Ramachandran—Engineering Manager at Pluto TV—used OpenTelemetry to implement his initial proof of concept and get his entire organization shipping observability data at scale.

Changing Perspectives: A Deep Dive into the Security Posture of 600+ Real-World AWS Environments

Earlier this year, Datadog released the “State of AWS Security” study, which examined real-world data from more than 600 organizations and AWS accounts to understand the security posture of global AWS users who also leverage the Datadog Cloud Security Platform. Join Datadog’s Christophe Tafani-Dereeper and Andrew Krug as they explore some important insights from this study, such as the top ways organizations are breached on AWS and how tooling like Datadog Cloud Security Posture Management can help.

What is Kafka?

Apache Kafka is a popular open source platform for streaming, storing, and processing high volumes of data. In this video, we break down how Kafka works and how it’s able to provide you with a reliable, scalable, and highly performant service for managing events. We also touch on some key resources for effectively monitoring your Kafka deployments via Datadog.

When Cloud Native Stacks Misbehave - Pitfalls and Lessons Learned | Itiel Shwartz (Komodor)

In this session, Itiel Shwartz will demonstrate common failure scenarios - both app and infra related. We will laugh a little and cry a little, and then cover monitoring, observability & troubleshooting best practices methodologies such as metrics, distributed tracing, logging, network visualization and more. But cheer up! We’ll wrap up by introducing some helpful tools, in order to find and fix issues as fast as possible.

Bringing "Blameless" to Traffic Court | J. Paul Reed (Release Engineering Approaches)

What do modern incident analysis techniques and moving violations have in common? This Quick Bite tells the story of taking the same retrospective techniques the most innovative technology companies in the world use to understand their operational incidents... to traffic court, to help us all understand what really happened? What happened next? Come find out!

Datadog on gRPC

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability. As teams investigated incidents related to their services, they discovered that some of them were gRPC related. But, were there common patterns to those incidents? Could we use them to learn more about gRPC and how to use it better?

Investigate critical alerts on the go with the Datadog mobile app

The Datadog mobile app provides real-time visibility into critical alerts, incidents, and application performance metrics across your entire environment, helping you troubleshoot directly from your mobile device. On-call engineers can quickly evaluate the conditions that triggered an alert, determine its urgency, and decide the next course of action—anywhere, anytime.