Operations | Monitoring | ITSM | DevOps | Cloud

July 2020

Stabilizing Marathon: Part III

So far we covered team culture which amplifies our code culture and design. It was kind of abstract so far and you’ll be forgiven if you skipped right a way to this part. I will cover our test and release pipeline, the thing that probably has had the biggest impact on Marathon’s stability. The pipeline enabled us to discover issues before our users did. I will first give an overview of the pipeline stages and dive deep into the Loop. You will soon see what I meant by that.

Stabilizing Marathon: Part II

Part I covered our team culture which applies to many different types of work and teams. This part will cover our software engineering best practices that help us stabilize Marathon. Marathon is written in Scala and makes heavy use of Akka Actors and Streams. I probably don’t have to mention that Scala’s type system and its immutable data structures avoid a lot of bugs before we even run unit tests.

Stabilizing Marathon: Part I

This is a review of the last three years that we spent stabilizing Marathon. Marathon is the central workload scheduler in DC/OS. Most of the time when you launch an app or a service on DC/OS, it is Marathon that starts it on top of Apache Mesos. Mesos manages the compute and storage resources and Marathon orchestrates the workload. We sometimes dub it the “init.d of DC/OS”. Being such an integral part of DC/OS, we must ensure that it keeps functioning.

Double Header: Konvoy 1.5 and Kommander 1.1 Are GA!

Today we made Konvoy 1.5 and Kommander 1.1 generally available. In January, D2iQ defined a 12 month roadmap for Kommander and Konvoy. With these newest releases focused on the Single Enterprise Experience, that mission is halfway complete. Here are some of the highlights of the latest releases.

Q&A with Ziff Media Group: Why They Made the Switch to Kubernetes

Today’s leading companies are one step ahead of their competitors as they adopt new tools and disciplines emerging from the cloud native landscape. That was the case for Ziff Media Group, which is a collection of several media web properties including pcmag.com, mashable.com, deals.com, offers.com, and more.

O'Reilly eBook: Cloud Native Containers and Next-Gen Apps

Developers often struggle when first encountering the cloud. Learning about distributed systems, becoming familiar with technologies such as containers and functions, and knowing how to put everything together can be daunting. With this practical guide, you'll get up to speed on patterns for building cloud native applications and best practices for common tasks such as messaging, eventing, and DevOps. Authors Boris Scholl, Trent Swanson, and Peter Jausovec describe the architectural building blocks for a modern cloud native application. You'll learn how to use microservices, containers, serverless computing, storage types, portability, and functions.

KUDO for Kubeflow: The Enterprise Machine Learning Platform

Machine learning is the power cable for your business. Without it, your data center is a museum of hard drives. While machine learning can supercharge data-driven businesses, it requires both expertise and a complex suite of technologies to make it work. D2iQ’s KUDO for Kubeflow, which is in technical preview, is the enterprise platform designed to take you from prototype to production in no time.

Introducing Conductor

It comes as no surprise that the demand for Kubernetes is skyrocketing across the industry. According to the CNCF’s 2019 survey, 78% of respondents are using Kubernetes in production today. This growth is contributing to a surge of demand for talent: there are over 100 thousand cloud native job postings across Dice and Indeed alone. The talent pool of people that have worked with Kubernetes and the adjacent technologies is limited and demand is growing.

Managing Kubernetes: From a Small Fleet to a Navy of Clusters

To keep pace with the ever-changing digital landscape, organizations are adopting open source and cloud native technologies at an incredible pace. But as the number of clusters and workloads grow, it can become increasingly difficult to know where clusters exist and how they are performing. And if multiple teams are provisioning and using clusters with different policies, roles, and configurations, you might as well jump ship. Because before you know it, you'll begin to experience cluster sprawl, and your multi-cluster operations will potentially capsize before you reach shore.