Operations | Monitoring | ITSM | DevOps | Cloud

March 2020

Datadog's Trace Outliers automatically surfaces error patterns across your environment

When monitoring highly distributed applications, which might rely on hundreds of services and infrastructure components across multiple cloud-based and on-premise environments, identifying problems and pinpointing the origin of an issue can be challenging. Even if you already have robust monitoring and alerts, your infrastructure and applications will likely change over time, which may make it difficult to reliably detect irregular behavior.

Monitor Apache Flink with Datadog

Apache Flink is an open source framework, written in Java and Scala, for stateful processing of real-time and batch data streams. Flink offers robust libraries and layered APIs for building scalable, event-driven applications for data analytics, data processing, and more. You can run Flink as a standalone cluster or use infrastructure management technologies such as Mesos and Kubernetes.

Monitor vSphere with Datadog

VMware vSphere is a server virtualization platform that enables organizations to provision and manage virtual machines at scale. With its comprehensive suite of products, vSphere helps companies manage datacenter resources, migrate workloads without downtime, run applications with high availability, and more. To keep tabs on dynamic vSphere environments and effectively address resource bottlenecks, you need deep visibility across every part of your infrastructure.

Monitor Scylla with Datadog

Scylla is an open source database alternative to Apache Cassandra, built to deliver significantly higher throughput, single-digit millisecond latency, and always-on availability for real-time applications. Unlike Cassandra which is written in Java, Scylla is implemented in C++ to provide greater control over low-level operations and eliminate latency issues related to garbage collection.

NGINX 502 Bad Gateway: Gunicorn

Gunicorn is a popular application server for Python applications. It uses the Web Server Gateway Interface (WSGI), which defines how a web server communicates with and makes requests to a Python application. In production, Gunicorn is often deployed behind an NGINX web server. NGINX proxies web requests and passes them on to Gunicorn worker processes that execute the application.

Best practices for tagging your monitors

Tags provide critical context for troubleshooting issues across any dimension of your environment. By applying best practices for tagging your systems, you can efficiently organize and analyze all your monitoring data, and set up automated multi alerts to streamline alerting workflows. Similar to any tags you would add to your services and infrastructure, monitor tags—tags that you apply to your monitors—are an essential feature for organizing and simplifying your workflows.

Monitoring in the Kubernetes era

Container technologies have taken the infrastructure world by storm. Ideal for microservice architectures and environments that scale rapidly or have frequent releases, containers have seen a rapid increase in usage in recent years. But adopting Docker, containerd, or other container runtimes introduces significant complexity in terms of orchestration. That’s where Kubernetes comes into play.

Monitoring Kubernetes performance metrics

As explained in Part 1 of this series, monitoring a Kubernetes environment requires a different approach than monitoring VM-based workloads or even unorchestrated containers. The good news is that Kubernetes is built around objects such as Deployments and DaemonSets, which provide long-lived abstractions on top of dynamic container workloads.

Collecting metrics with built-in Kubernetes monitoring tools

In the previous post in this series, we dug into the data you should track so you can properly monitor your Kubernetes cluster. Next, you will learn how you can start inspecting your Kubernetes metrics and logs using free, open source tools. In this post we’ll cover several ways of retrieving and viewing observability data from your Kubernetes cluster.

Monitoring Kubernetes with Datadog

If you’ve read Part 3 of this series, you’ve learned how you can use different Kubernetes commands and add-ons to spot-check the health and resource usage of Kubernetes cluster objects. In this post we’ll show you how you can get more comprehensive visibility into your cluster by collecting all your telemetry data in one place and tracking it over time.

NGINX 502 Bad Gateway: PHP-FPM

This post is part of a series on troubleshooting NGINX 502 Bad Gateway errors. If you’re not using PHP-FPM, check out our other article on troubleshooting NGINX 502s with Gunicorn as a backend. PHP-FastCGI Process Manager (PHP-FPM) is a daemon for handling web server requests for PHP applications. In production, PHP-FPM is often deployed behind an NGINX web server. NGINX proxies web requests and passes them on to PHP-FPM worker processes that execute the PHP application.

Introducing wildcard-filtered metric queries

Tags are essential for your teams to quickly and efficiently filter through and find the information they need among the huge scope of data generated by your cloud infrastructure. Given that modern environments are always changing, with hosts and containers continuously being added or replaced, you need to be able to dynamically scope your queries so that you’re not rewriting the same searches over and over again.