Operations | Monitoring | ITSM | DevOps | Cloud

Datadog on Data Engineering Pipelines: Apache Spark at Scale

Datadog is an observability and security platform that ingests and processes tens of trillions of data points per day, coming from more than 22,000 customers. Processing that amount of data in a reasonable time stretches the limits of well known data engines like Apache Spark. In addition to scale, Datadog infrastructure is multi-cloud on Kubernetes and the data engineering platform is used by different engineering teams, so having a good set of abstractions to make running Spark jobs easier is critical.

Data Denormalization: Pros, Cons & Techniques for Denormalizing Data

The amount of data organizations handle has created the need for faster data access and processing. Data Denormalization is a widely used technique to improve database query performance. This article discusses data normalization, its importance, how it differs from data normalization and denormalization techniques. Importantly, I’ll also look at the pros and cons of this approach.

Compactor: A Hidden Engine of Database Performance

This article was originally published in InfoWorld and is reposted here with permission. The compactor handles critical post-ingestion and pre-query workloads in the background on a separate server, enabling low latency for data ingestion and high performance for queries. The demand for high volumes of data has increased the need for databases that can handle both data ingestion and querying with the lowest possible latency (aka high performance).

Streaming conversion of Apache Kafka topics from JSON to Avro with Apache Flink

Pushing data in JSON format to an Apache Kafka topic is very common. However, dealing with messages not having a predefined structure can create some problems, specifically when trying to sink the data via connectors, like the JDBC sink, which require the knowledge of the message structure. Transforming the messages from JSON to AVRO can enforce a schema on messages and allow the usage of a bigger variety of connectors.

How Geometric Search Works for Hexagons in Elasticsearch

Geographic grid systems allow zooming into maps at progressively higher resolutions and finer grids. For rectangular grids, this is very simple, but for hexagonal grids, the situation is much more complex, since child hexagons are not fully contained within parent hexagons. This video demonstrates how we can still achieve efficient parent-child search in Elasticsearch using the H3 hexagonal grid.

Data lake vs. data mesh: Which one is right for you?

What’s the right way to manage growing volumes of enterprise data, while providing the consistency, data quality and governance required for analytics at scale? Is centralizing data management in a data lake the right approach? Or is a distributed data mesh architecture right for your organization? When it comes down to it, most organizations seeking these solutions are looking for a way to analyze data without having to move or transform it via complex extract, transform and load (ETL) pipelines.

How to Benchmark Cloud FinOps Effectively

Cloud FinOps is rapidly becoming popular among organizations in today’s digital age due to its ability to help manage financial operations more efficiently. This is because it allows organizations to track, measure, and optimize their cloud spend with greater visibility. It also helps improve operational efficiency by automating numerous financial processes, including billing, budgeting, auditing, and reporting.

OpenSearch vs Elasticsearch: Which One Is Better to Use?

Whenever we start a search consulting project from scratch, the obvious question is: which search engine to use? We’ve talked about Elasticsearch vs Solr before, but here we’ll compare Elasticsearch with its fork, OpenSearch. Chances are, if you need to decide between the two, you’ll be looking at a few dimensions.

Strategize your Azure migration for SQL workloads with Datadog

Migrating an on-prem database to a public cloud comes with a number of benefits, such as no longer needing to manage and maintain physical infrastructure, dynamic scaling, disaster recovery, and overall cost reduction. However, migrating to the cloud can often be a complex and daunting task. For instance, if an organization is a Microsoft shop with teams that rely on SQL Server databases, Azure is a natural fit for its needs.

Predictive Maintenance for Industrial IoT Devices at the Edge

In industrial operations, time is money. The more efficient processes and machinery are, the better it is for business. Providing proactive monitoring and maintenance of industrial machines, however, is not an easy task. This is especially true as these machines become increasingly complex and distributed. It’s not possible to have maintenance crews on site for every asset in a distributed system. The edge is where the physical world meets the digital world.