Operations | Monitoring | ITSM | DevOps | Cloud

InfluxData

How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0

A few weeks ago, we published some benchmarking that showed performance gains in InfluxDB 3.0 that are orders of magnitude better than previous versions of InfluxDB – and by extension, other databases as well. There are two key factors that influence these gains: 1. Data ingest, and 2. Data compression. This begs the question, just how did we achieve such drastic improvements in our core database? This post sets out to explain how we accomplished these improvements for anyone interested.

Build a Data Streaming Pipeline with Kafka and InfluxDB

InfluxDB and Kafka aren’t competitors – they’re complimentary. Streaming data, and more specifically time series data, travels in high volumes and velocities. Adding InfluxDB to your Kafka cluster provides specialized handling for your time series data. This specialized handling includes real-time queries and analytics, and integration with cutting edge machine learning and artificial intelligence technologies. Companies like as Hulu paired their InfluxDB instances with Kafka.

Mage.ai for Tasks with InfluxDB

Any existing InfluxDB user will notice that InfluxDB underwent a transformation with the release of InfluxDB 3.0. InfluxDB v3 provides 45x better write throughput and has 5-25x faster queries compared to previous versions of InfluxDB (see this post for more performance benchmarks). We also deprioritized several features that existed in 2.x to focus on interoperability with existing tools. One of the deprioritized features that existed in InfluxDB v2 is the task engine.

The Plan for InfluxDB 3.0 Open Source

The commercial version of InfluxDB 3.0 is a distributed, scalable time series database built for real-time analytic workloads. It supports infinite cardinality, SQL and InfluxQL as native query languages, and manages data efficiently in object storage as Apache Parquet files. It delivers significant gains in ingest efficiency, scalability, data compression, storage costs, and query performance on higher cardinality data.

Time Series Is out of This World: Data in the Space Sector

While time series data is critical for space industries, managing that data is not always straightforward. While humans have yet to develop light-speed travel, teleportation or lots of the other cool things we see in movies or read in books, that doesn’t mean we aren’t making progress. Advances in technology are starting, ever so slowly, to blur the lines between science fiction and reality when it comes to outer space.

Introduction to Apache Arrow

A look at what Arrow is, its advantages and how some companies and projects use it. Over the past few decades, using big data sets required businesses to perform increasingly complex analyses. Advancements in query performance, analytics and data storage are largely a result of greater access to memory. Demand, manufacturing process improvements and technological advances all contributed to cheaper memory.