Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Introducing wildcard-filtered metric queries

Tags are essential for your teams to quickly and efficiently filter through and find the information they need among the huge scope of data generated by your cloud infrastructure. Given that modern environments are always changing, with hosts and containers continuously being added or replaced, you need to be able to dynamically scope your queries so that you’re not rewriting the same searches over and over again.

Monitor Apache Airflow with Datadog

Apache Airflow is an open source system for programmatically creating, scheduling, and monitoring complex workflows including data processing pipelines. Originally developed by Airbnb in 2014, Airflow is now a part of the Apache Software Foundation and has an active community of contributing developers. Airflow represents workflows as Directed Acyclic Graphs (DAGs), which are made up of tasks written in Python. This allows Airflow users to programmatically build and modify their workflows.

How to monitor Kubernetes audit logs

Datadog operates large-scale Kubernetes clusters in production across multiple clouds. Along the way, audit logs have been extremely helpful for tracking user interactions with the API server, debugging issues, and getting clarity into our workloads. In this post, we’ll show you how to leverage the power of Kubernetes audit logs to get deep insight into your clusters.

Monitor email workflows with Datadog Browser Tests

Monitoring your application from end to end is important for ensuring that core functionalities work as designed. Datadog’s browser tests help you verify that key user workflows—such as signing up for a new account—are consistent across devices and locations. Within these workflows, email often plays a key role in onboarding users and providing customers with important information about their accounts and application activity, such as profile changes and order confirmations.

Monitor SNMP with Datadog

As your on-premise network infrastructure grows in size and complexity, monitoring thousands of devices becomes a challenge. Whether you’re monitoring firewalls in a branch office or the routing and switching fabric in your datacenter over which all customer transactions are performed, visibility into all points of your infrastructure is critical for network maintenance.

Monitor Vault metrics and logs

Hashicorp Vault is a tool for managing secrets—sensitive data such as passwords, certificates, and API keys. Vault allows you to encrypt your secrets, control access to them, and audit activity to see who has requested data from your Vault. Datadog already monitors the status of your Vault servers—for example, you can configure the Vault integration to automatically notify you if a Vault server is unexpectedly sealed, or if there is a leader change in your Vault cluster.

Monitor ClickHouse with Datadog

ClickHouse is an open source database management system, and was originally developed as a backend for Yandex’s Metrica analytics platform. ClickHouse is column oriented, meaning that it can quickly scan through ranges of values in a single column without touching irrelevant values in other columns. This makes ClickHouse well suited for online analytical processing (OLAP).

Key metrics for monitoring AWS Lambda

AWS Lambda is a compute service that enables you to build serverless applications without the need to provision or maintain infrastructure resources (e.g., server capacity, network, security patches). AWS Lambda is event driven, meaning it triggers in response to events from other services, such as API calls from Amazon API Gateway or changes to a DynamoDB table.