Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Generative AI and Observability Automation - Sajid Mehmood & Michael Gerstenhaber

One of the biggest challenges in observability is separating the signal from the noise. As artificial intelligence (AI) tools become more powerful and accessible, it has generated a lot of buzz around the role of AI with respect to the performance and reliability of our technical systems and the teams that build and operate them. In this fireside chat, Michael Gertenhaber (Datadog VP of Product) and Sajid Mehmood (Datadog VP of Engineering) will sift through the hype to chat about what generative AI and Large Language Models (LLMs) will really mean for the future of observability and how it can benefit your teams today.

Right Size, Right Performance, Right Time

It’s been said that, “premature optimization is the root of all evil.” Contrarily, many engineers have also had to work with software riddled with so much technical debt and inefficiency that optimization is practically impossible and a complete rewrite is required. So when is the right time? In this panel session, we’ll talk with engineering leaders and architects about their approach to software optimization, when to do it, and how to design systems that scale and stay performant.

CTO Fireside Chat

Building large scale technical systems is hard, but building and scaling high performing technical organizations is even more difficult. In this session, Datadog Co-founder and CTO Alexis Lê-Quôc will sit down with Prashant Pandey, Head of Engineering at Asana, to discuss their approach to engineering leadership. They’ll share the hard-learned lessons from their long careers to help you cultivate better technical teams, covering topics from staying in tune with new technologies, enabling innovation, shipping modern ML and AI-based features, and scaling teams.

Efficiency and Effectiveness

WIth unlimited money, most technology problems become easy to solve. But how do you design, build, and operate large scale, performant systems without breaking the bank? In this session, Chandru Subramanian (Director of Engineering, Runtime Efficiency at Datadog) and Neil Innes (Sr. Engineering Manager, DevOps at FanDuel) will discuss how they balance efficiency and effectiveness to save money while also meeting key goals.

The Darkside of GraphQL

GraphQL is a query language for APIs that provides a powerful and efficient way to query and manipulate data. As powerful and versatile as GraphQL is, its downside is that it can be vulnerable to certain security threats. In this presentation, we will discuss the security vulnerabilities associated with GraphQL, from the basics to more advanced threats, and how to best protect against them. After this presentation, attendees will have a better understanding of security vulnerabilities in GraphQL, as well as an understanding of the steps needed to protect against them.

Innovating with Faster, Safer Experimentation

Experimentation is the key to innovation. But experiments come with risks, not just of failure, but of wasted time, effort, and money. I’ll share the experimental approach that NTT DOCOMO, Japan’s largest wireless provider, takes to build digital products that customers love. I’ll also present examples from experiments we performed on NTT DOCOMO’s Smart-life website that improved the user experience and significantly increased conversion rates. In this session, you’ll learn how to reduce the risk of experiments and iterate faster to improve your services.

Container Security Fundamentals - Linux Namespaces (Part 4): The User Namespace

In this video we continue our examination of Linux namespaces by looking at some details of how the user namespace can be used to de-couple the user ID inside a container from the user ID on the host, allowing a container to run as the root user without the risks of being root on the host. To learn more, read our blog on Datadog’s Security Labs site.

Key questions to ask when setting SLOs

Many organizations rely on service level objectives (SLOs) to help them gauge the reliability of their products. By setting SLOs that define clear and measurable reliability targets, businesses can ensure they are delivering positive end-user experiences to their customers. Clearly defined SLOs also make it much easier for businesses to understand what tradeoffs they may have to make in order to deliver those specific experiences.

How to monitor CoreDNS with Datadog

In Part 1 of this series, we introduced you to the key metrics you should be monitoring to ensure that you get optimal performance from CoreDNS running in your Kubernetes clusters. In Part 2, we showed you some tools you can use to monitor CoreDNS. In this post, we’ll show you how you can use Datadog to monitor metrics, logs, and traces from CoreDNS alongside telemetry from the rest of your cluster, including the infrastructure it runs on.

Tools for collecting metrics and logs from CoreDNS

In Part 1 of this series, we looked at key metrics you should monitor to understand the performance of your CoreDNS servers. In this post, we’ll show you how to collect and visualize these metrics. We’ll also explore how CoreDNS logging works and show you how to collect CoreDNS logs to get even deeper visibility into your Deployment.