Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

How Gremlin monitors its own Chaos Engineering service with Datadog

Reliable systems are vital to meeting customer expectations. Downtime not only hurts a company’s bottom line but can be detrimental to reputation. Our goal at Gremlin is to help enterprises build more reliable systems using Chaos Engineering. Whether your infrastructure is deployed on bare metal in a corporate-owned data center or as Kubernetes-orchestrated microservices in a public cloud, chaos experiments can help you find system weaknesses early, before they affect customers.

Introducing the Datadog IoT Agent

From smart thermostats and grocery store checkouts to public utility infrastructures and industrial manufacturing lines, the Internet of Things (IoT) is all around us—and growing larger every day. But with this rapid growth comes a number of operational challenges: IoT devices collect a large amount of data, and are often distributed across harsh, ever-changing environments.

Diagnosing out-of-memory errors on Linux

Out-of-memory (OOM) errors take place when the Linux kernel can’t provide enough memory to run all of its user-space processes, causing at least one process to exit without warning. Without a comprehensive monitoring solution, OOM errors can be tricky to diagnose. In this post, you will learn how to use Datadog to diagnose OOM errors on Linux systems.

Test on-premise applications with Datadog Synthetic private locations

Synthetic monitoring lets you improve end user experience by proactively verifying that they can complete important transactions and access key endpoints. But your applications serve many users, from customers to all the employees who run your business. This makes testing the performance of any internal-facing services within your private network just as critical as monitoring your external-facing applications.

How to Use the Datadog CLI on Kubernetes | Datadog Tips & Tricks

In this video, you’ll learn how to use the Datadog command line interface (CLI) on Kubernetes to perform key tasks, including checking the status of the agent and viewing custom checks. The Datadog Agent CLI allows you to check the status of the Agents running on the pods in your Kubernetes clusters. It also provides various helpful commands, including starting and stopping the agent, viewing configured custom checks, and sending flares to the Datadog support team to automatically open troubleshooting tickets.

Datadog on RocksDB

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. Each of those is associated with metadata, mostly in the form of tags, and it can also be part of streams of related data points, which can then be explored, queried, or aggregated. RocksDB is used by many services at Datadog that are part of that metrics ingestion, aggregation, query, and index pipeline.

How to Manage Datadog Resources Using Terraform | Datadog Tips & Tricks

Terraform allows you to efficiently manage complex infrastructure environments, and Datadog is an important piece of those environments. With the Datadog provider, you can use Terraform to manage your Datadog resources as code, allowing you to create and edit resources with the same tool you’re already using for your infrastructure. This video will show you how to do just that through the example of creating a Datadog monitor.

How to Build A Unified Dashboard | Datadog Tips & Tricks

In this video, you’ll learn how to create unified dashboards to enable your teams with valuable information and performance visualizations from across the Datadog platform. Dashboards allow your teams to see all data from across the Datadog platform side-by-side, enabling holistic visibility and breaking down silos between Dev and Ops teams. In this video, you’ll learn how to create a Screenboard, showcasing data such as frontend system latency, backend system latency, and Service Level Objectives all in one place.

Identifying Environment Right Sizing Opportunities for Cost Efficiency | Datadog Tips & Tricks

In this video, you’ll learn how to use the host map to identify opportunities to rightsize your environment to become more cost efficient. MoneySuperMarket Group was able to cut their cloud infrastructure costs by over 50% by utilizing Datadog. This video unpacks some of the practices used by MoneySuperMarket’s engineering team to accomplish that.