FOSDEM - Costa Tsaousis: Netdata Open Source Distributed Observability Pipeline Journey & Challenges

FOSDEM - Costa Tsaousis: Netdata Open Source Distributed Observability Pipeline Journey & Challenges

Feb 28, 2024

FOSDEM - Costa Tsaousis: Netdata Open Source Distributed Observability Pipeline Journey & Challenges

ABSTRACT:

Netdata is a powerful open-source, distributed observability pipeline designed to provide higher fidelity, easier scalability, and a lower cost of ownership compared to traditional monitoring solutions. This presentation will offer an in-depth overview of the journey we've undertaken in building Netdata, highlighting the challenges we've faced and the innovative solutions we've developed to address them.

In this presentation, we will delve into the history of Netdata, starting from its inception. We'll discuss the initial goals of creating a monitoring tool that could offer high-resolution metrics, auto-detection of metrics, and real-time visualization, all with minimal configuration required. We'll also explore how Netdata garnered rapid attention and support from the open-source community.

The presentation will then shift focus to the evolution of Netdata, including the development of:

  • a very efficient database engine (Netdata vs Prometheus: 35% less cpu utilization, 49% less memory, 12% less bandwidth, 98% less disk I/O, and 75% less size on disk for the same dataset),
  • a powerful alerting engine that uses statistical analysis, machine learning and templates for fully automated alerts
  • a streaming protocol supporting streaming of live data, replication of metrics, and support for routing interactive queries between Netdata agents

Next, we'll discuss Netdata's transition into a distributed monitoring solution and the decision to seek funding to build a sustainable company around the open-source project. We'll emphasize the core concept of the Netdata SaaS model, which keeps monitoring features open-source while offering additional benefits like better integration, RBAC, and support through the SaaS platform.

The presentation will showcase how Netdata's architecture allows for distributed monitoring, vertical and horizontal scalability, and real-time visibility across infrastructure. We'll discuss the challenges faced and overcome, such as the creation of a custom database engine, the streaming protocol as a fundamental element of the distributed nature of Netdata, the introduction of unsupervised anomaly detection, the implementation of distributed queries, the need for clarity and transparency during visualization and the user interface concepts we used to avoid providing a query editor to users.

We'll also touch upon the challenges that remain, such as the use of WebRTC for browser-to-agent communication, the need to standardize metric naming conventions, the development of an expert system for metric interpretation, the support of infrastructure-level alerts in a distributed fashion and standardizing user experiences.

By the end of this presentation, the audience will gain a deep understanding of Netdata's evolution, its unique features, and the challenges and solutions encountered along the way. They will also take away insights into building and maintaining large-scale open-source projects, as well as actionable ideas for improving their own monitoring and observability solutions.

🚀 Interact with the Team: