It is still early days for open source solutions for productionalising and deploying machine learning (ML) models, managing scalable data pipelines and data science experiments. Kubeflow is a collection of tools that are perfect for these use cases and is gaining popularity for a good reason.
This talk describes a system built on top of Kubeflow which is generic enough to be used for managing ML pipelines of various shapes and sizes, yet flexible enough to allow entirely custom workflows. At its core, there is a set of conventions which determine where data is read from and written to, and expressing data preprocessing and models as a configuration of composable objects and functions. This approach makes it trivial to add new models, datasets, and training objectives to a production system, and enables training and deploying stacked models of arbitrary complexity.