Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

Auditing Your Automation's Access: Using More Automation

Between CI/CD pipelines, container orchestrators, and developer debugging tools, more and more automation is needed to scale your systems. But how do you know if that automation is accessing the right systems at the right time? And how do you ensure that your automation is safe from exploits by unauthorized users?

What is Kafka?

Apache Kafka is a popular open source platform for streaming, storing, and processing high volumes of data. In this video, we break down how Kafka works and how it’s able to provide you with a reliable, scalable, and highly performant service for managing events. We also touch on some key resources for effectively monitoring your Kafka deployments via Datadog.

When Cloud Native Stacks Misbehave - Pitfalls and Lessons Learned | Itiel Shwartz (Komodor)

In this session, Itiel Shwartz will demonstrate common failure scenarios - both app and infra related. We will laugh a little and cry a little, and then cover monitoring, observability & troubleshooting best practices methodologies such as metrics, distributed tracing, logging, network visualization and more. But cheer up! We’ll wrap up by introducing some helpful tools, in order to find and fix issues as fast as possible.

Bringing "Blameless" to Traffic Court | J. Paul Reed (Release Engineering Approaches)

What do modern incident analysis techniques and moving violations have in common? This Quick Bite tells the story of taking the same retrospective techniques the most innovative technology companies in the world use to understand their operational incidents... to traffic court, to help us all understand what really happened? What happened next? Come find out!

Datadog on gRPC

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability. As teams investigated incidents related to their services, they discovered that some of them were gRPC related. But, were there common patterns to those incidents? Could we use them to learn more about gRPC and how to use it better?

Investigate critical alerts on the go with the Datadog mobile app

The Datadog mobile app provides real-time visibility into critical alerts, incidents, and application performance metrics across your entire environment, helping you troubleshoot directly from your mobile device. On-call engineers can quickly evaluate the conditions that triggered an alert, determine its urgency, and decide the next course of action—anywhere, anytime.