Datadog

New York City, NY, USA
2010
  |  By David M. Lentz
Etcd plays a critical role in your Kubernetes setup: it stores the ever-changing state of your cluster and its objects, and the API server uses this data to manage cluster resources. As your applications thrive and your Kubernetes clusters see more traffic, etcd handles an increasing amount of data. But etcd’s storage space is limited: the recommended maximum is 8 GiB, and a large and dynamic cluster can easily generate enough data to reach that limit.
  |  By Candace Shamieh
Pinecone is a vector database that helps users build and deploy generative AI applications at scale. Whether using its serverless architecture or a hosted model, Pinecone allows users to store, search, and retrieve the most meaningful information from their company data with each query, sending only the necessary context to Large Language Models (LLMs). By providing the ability to search and retrieve contextual data, Pinecone enables you to reduce LLM hallucinations and enhance data security.
  |  By Candace Shamieh
Microservices architectures empower individual teams to choose their own programming language, tools, and technologies, resulting in more independence and the ability to develop and release features faster. While there are various types of integration patterns that can facilitate microservice communication, many organizations choose to adopt event-driven architectures (EDAs) because of their scalability, agility, and resilience.
  |  By Datadog
On the December episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers Kubernetes Active Remediation, Datadog IaC Security, and a trio of new features for monitoring AWS resources. Later in the episode, Natasha Goel (Product Manager) spotlights Datadog Cloud Cost Management for OpenAI. Also featured is a short recap of Datadog at KubeCon North America and AWS re:Invent 2024.
  |  By Lauren Lowe
moovingon.ai is a platform that consolidates alerts, incidents, audits, runbooks, and other resources for 24/7 network operations center (NOC) engineering teams. These teams often have to work collaboratively to maintain uptime for mission-critical cloud infrastructure and applications and need specialized resources to facilitate investigations in the event of an issue.
  |  By Andrew Krug
Whether or not you made the journey to this year’s AWS re:Invent, there’s always a variety of great announcements lost amid an action-packed week of keynotes, breakouts, expo hall demos, and networking sessions. No need to worry—we’re always happy to be a big part of the re:Invent experience and share our observations with you. You can also join us on December 17, 2024, for a re:Invent re:Cap livestream by registering here.
  |  By Samantha Scaglione
When you have a complex IT environment with many disparate tools, data sources, and teams, alert noise becomes overwhelming. This can delay incident response and cause missed alerts, ultimately leading to critical incidents and outages. Datadog Event Management’s Event Correlation groups and deduplicates events and alerts, reducing noise and helping response teams act on alerts faster.
  |  By Sriram Raman
Organizations often struggle to maintain visibility and control over their distributed cloud infrastructure, where changes in a single resource can have cascading effects throughout the system and potentially cause disruptions. In these environments, infrastructure changes that lead to incidents are often hard to troubleshoot—especially when teams are using disparate tools with siloed data—leading to longer resolution times, more downtime, and negative business outcomes.
  |  By Mahashree Rajendran
Organizations today rely on cloud object storage to power diverse workloads, from data analytics and machine learning pipelines to content delivery platforms. But as data volumes explode and storage patterns become more complex, teams often struggle to understand and proactively optimize their storage utilization. When issues arise—such as unexpected costs or performance bottlenecks—these teams frequently lack the visibility needed to quickly identify and resolve root causes.
  |  By Danny Driscoll
Amazon Elastic Container Service (ECS) is a container orchestration service that enables you to efficiently deploy new applications or modernize existing ones by migrating them to a containerized environment. Building on ECS gives you the flexibility, scalability, and security that containers offer, but also presents challenges in monitoring and troubleshooting your applications and infrastructure.
  |  By Datadog
Did you miss this year’s re:Invent? Or maybe you were onsite but too busy deep diving on certifications, new products, and networking. Don’t worry—the Datadog team is streaming right to your home on December 17 to recap all of the highlights from the event. Join Andrew Krug from Datadog’s Technical Community along with a host of AWS guests to hear about exciting announcements from AWS re:Invent 2024, Datadog’s latest product launches, and a rundown of the best on-demand sessions that you’ll want to make sure to tune into.
  |  By Datadog
Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we put the Spotlight on Datadog Cloud Cost Management for OpenAI.
  |  By Datadog
Datadog Database Monitoring unifies query, application, and database telemetry in one platform, enabling teams to easily identify bottlenecks, understand database load, optimize query performance, uncover costly queries, and correlate database and application telemetry.
  |  By Datadog
Cloud spending continues to grow, but managing costs effectively remains a challenge for many organizations. In this video, Datadog Senior Product Manager Kayla Taylor dives into our recent State of Cloud Costs report—which analyzed AWS cloud cost data from hundreds of organizations—to understand the key factors driving cloud expenses. We explore the impact of adopting emerging compute technologies like Arm-based processors, GPUs, and AI capabilities, how usage patterns and previous-generation technologies affect cloud costs, and the role of AWS discount programs in cost management.
  |  By Datadog
In this video we’ll continue looking at how Kubernetes handles authentication with a look at bootstrap and static token authentication.
  |  By Datadog
Datadog operates dozens of Kubernetes clusters, tens of thousands of hosts, and millions of containers across a multi-cloud environment, spanning AWS, Azure, and Google Cloud. With over 2,000 engineers, we needed to ensure that every developer and application could securely and efficiently access resources across these various cloud providers.
  |  By Datadog
This video aims to showcase how developers can self-serve from an application to simplify the management of their AWS cloud resources. Rather than switching between tools or reaching out to another team for help, developers can take action directly from their observability tool, enabling faster resolution of application issues. We will demonstrate how to build a simple app that allows them to minimize disruptions by quickly taking action on their SQS queues in AWS, using insights provided by Datadog.
  |  By Datadog
Temporal is an open source platform to build resilient and reliable distributed systems. Datadog started using Temporal in 2020 as the foundation for our internal software delivery platform. Since then, its usage has been widely adopted as a platform that any engineering team can use to build their systems. In this Datadog on episode, Ara Pulido chats with Loïc Minaudier, Senior Software Engineer in the Atlas team, responsible for providing a developer platform on top of Temporal, and Allen George, Engineering Manager in the Datadog Workflows team.
  |  By Datadog
On This Month in Datadog, we’re spotlighting LLM Observability’s native integration with Google Gemini, which automatically captures the LLM requests your application makes to Gemini models.
  |  By Datadog
Datadog Service Catalog automatically consolidates real-time observability data and internal engineering knowledge about all of your services into a unified view.
  |  By Datadog
As Docker adoption continues to rise, many organizations have turned to orchestration platforms like ECS and Kubernetes to manage large numbers of ephemeral containers. Thousands of companies use Datadog to monitor millions of containers, which enables us to identify trends in real-world orchestration usage. We're excited to share 8 key findings of our research.
  |  By Datadog
The elasticity and nearly infinite scalability of the cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived VMs or containers. This has elevated the need for new methods and new tools for monitoring. In this eBook, we outline an effective framework for monitoring modern infrastructure and applications, however large or dynamic they may be.
  |  By Datadog
Where does Docker adoption currently stand and how has it changed? With thousands of companies using Datadog to track their infrastructure, we can see software trends emerging in real time. We're excited to share what we can see about true Docker adoption.
  |  By Datadog
Build an effective framework for monitoring AWS infrastructure and applications, however large or dynamic they may be. The elasticity and nearly infinite scalability of the AWS cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived components. This has elevated the need for new methods and new tools for monitoring.
  |  By Datadog
Like a car, Elasticsearch was designed to allow you to get up and running quickly, without having to understand all of its inner workings. However, it's only a matter of time before you run into engine trouble here or there. This guide explains how to address five common Elasticsearch challenges.
  |  By Datadog
Monitoring Kubernetes requires you to rethink your monitoring strategies, especially if you are used to monitoring traditional hosts such as VMs or physical machines. This guide prepares you to effectively approach Kubernetes monitoring in light of its significant operational differences.

Datadog is the essential monitoring platform for cloud applications. We bring together data from servers, containers, databases, and third-party services to make your stack entirely observable. These capabilities help DevOps teams avoid downtime, resolve performance issues, and ensure customers are getting the best user experience.

See it all in one place:

  • See across systems, apps, and services: With turn-key integrations, Datadog seamlessly aggregates metrics and events across the full devops stack.
  • Get full visibility into modern applications: Monitor, troubleshoot, and optimize application performance.
  • Analyze and explore log data in context: Quickly search, filter, and analyze your logs for troubleshooting and open-ended exploration of your data.
  • Build real-time interactive dashboards: More than summary dashboards, Datadog offers all high-resolution metrics and events for manipulation and graphing.
  • Get alerted on critical issues: Datadog notifies you of performance problems, whether they affect a single host or a massive cluster.

Modern monitoring & analytics. See inside any stack, any app, at any scale, anywhere.