Operations | Monitoring | ITSM | DevOps | Cloud

July 2019

Running Spark with Jupyter Notebook & HDFS on Kubernetes

Kublr and Kubernetes can help make your favorite data science tools easier to deploy and manage. Hadoop Distributed File System (HDFS) carries the burden of storing big data; Spark provides many powerful tools to process data; while Jupyter Notebook is the de facto standard UI to dynamically manage the queries and visualization of results.

Kubernetes and the Data Layer

Once you get your head around the concept of containers, and subsequently the need for management and orchestration with tools like Kubernetes, what started off as a weekend project suddenly starts to raise more questions than answers. Kubernetes removes much of the complexity of managing the interaction between applications and the underlying infrastructure. It is designed to let developers focus on the applications and solutions rather than worrying about the complexity of the hosting platform.