Datadog On Caching

Datadog On Caching

May 2, 2023

Caching (and cache invalidation!) is often mentioned as one of the hardest problems in computer science. While caching can bring substantial performance improvements, reasoning about cached data can be extremely difficult as caching fundamentally means that you are no longer reading from your source of truth. With that in mind, many teams at Datadog needed to build distributed caches to scale their services and keep latency low.

As Datadog grew in size and complexity, teams designing and operating their own cache solutions started to become a bottleneck and added to the complexity. Based on that experience, a team was created to design, build and maintain a managed service for distributed in-memory caching, providing an easy way for over 2,000 engineers at Datadog to add fast caching to their system in a scalable, reliable, and consistent manner.

In this session, Ara Pulido, Staff Developer Advocate, will chat with Mitch Ward and Jessica Cordonnier, engineering managers on the Caching team at Datadog. They will explain how they used the learnings from prior cache implementations and distributed system principles to design the caching platform at Datadog. They will cover the various components that make up the platform, including the storage system, data structures, and scaling solutions.

By the end of the session you will understand caching systems better, their potential pitfalls and how to mitigate those, and how to run a cache infrastructure as an internal platform as a service. Unfortunately, we can't offer any help naming your internal caching platform; that's another difficult computer science problem for another time!

00:00 - Introduction

04:20 - Introduction to caching

10:23 - History of caching at Datadog

16:44 - Datadog's Caching team

19:45 - Designing Ephemera

26:05 - System Architecture

31:44 - Improving data persistance

35:33 - Network is hard

39:20 - Internal managed services

47:25 - Ephemera in the future

49:47 - Key takeaways

51:55 - Q&A