Operations | Monitoring | ITSM | DevOps | Cloud

Manage and optimize your OCI costs with Datadog Cloud Cost Management

Engineering teams need to deliver reliable, secure, and high-performing applications, all while keeping costs under control. But engineers often lack visibility into cloud cost data, relying on finance-driven reports that they receive only after the billing cycle closes. Without daily cost insights alongside observability data, they don’t know until it’s too late that an infrastructure change caused a significant cost increase.

How we use Datadog to get comprehensive, fine-grained visibility into our email delivery system

Visibility into email performance is indispensable to any organization that counts on its ability to reach people through their inboxes, including Datadog. SREs, FinOps, and many other teams rely on email as a critical channel for communications from our platform, including monitor alerts, usage reports, and service account notifications. At Datadog, we depend on the visibility provided by our integrations for Mailgun, SendGrid, and Amazon SES to optimize our email performance and ensure deliverability.

Keep stakeholders informed with Datadog Status Pages

When incidents occur, clear communication can be just as important as fast remediation. Your internal teams need timely updates to stay aligned, and your users want to know what is happening and when they can expect a fix. Without a reliable way to proactively share updates, support teams can get flooded with tickets and customer trust can erode. Datadog Status Pages, now generally available, makes it easy to keep everyone informed through a public or internal web page during outages.

Scaling Datadog observability: 1,000 integrations and counting

Integrations have always been central to the Datadog platform, enabling customers to collect the data they need directly from the technologies they use every day. By unifying signals from infrastructure and applications to security and SaaS applications, teams gain both high-level visibility and the ability to drill into the details that matter the most. With more than 1,000 integrations now available, the Datadog ecosystem continues to expand alongside the platforms our customers rely on.

Monitor Slurm with Datadog

Slurm (Simple Linux Utility for Resource Management) is an open source workload management system used to schedule jobs and manage resources for high-performance computing (HPC) Linux clusters. It ensures that jobs and resources are scheduled fairly and efficiently and is scalable across large clusters, an issue that native Linux process management tools struggle with.

Ship features faster and safer with Datadog Feature Flags

Releasing new features is one of the highest-stakes moments in the software delivery life cycle. Even with CI/CD pipelines in place, plenty of things can still go wrong when a feature goes live for actual users. Most feature flagging tools operate in isolation from important observability tooling, forcing engineers to monitor changes across multiple disconnected systems to fully understand their impact. This slows down development and increases the chance of missing critical issues.

Model your architecture with custom entities in the Datadog Software Catalog

Every software organization has its own unique architecture and workflows. Beyond services and APIs, teams rely on internal libraries, CI/CD jobs, data pipelines, AI agents, and more to keep systems running smoothly. But as architectures grow more complex and interconnected, it can become difficult to keep track of all the structural dependencies and interactions in one place.

Monitor your data pipelines with Airflow lineage

In complex data pipelines with dozens of jobs and intermediary datasets, it can be difficult to effectively monitor how data travels and changes through various steps. When tracking issues in these pipelines, you need visibility into upstream components where the root cause may originate from, as well as downstream datasets and consumers of data that may be experiencing further impacts.

Proactively monitor Kerberos-authenticated web apps and APIs with Datadog Synthetics

When employee authentication fails or becomes unreliable, users can lose access to the critical systems they need. Authentication enables access to internal tools like HR applications, finance portals, and internal dashboards, so even short outages can interrupt day-to-day work, while persistent issues increase the risk of broader operational disruption.