Introduction to Sparkling Water: Productionalizing H2O Models with Apache Spark

Presented by

Edgar Orendain, Software Engineer,

About this talk

Spark is a powerful and robust open-source, general-purpose computation platform. It is an invaluable tool for users who want to munge, wrangle, clean and transform data before training a model. Spark Pipelines are also powerful constructs but have little support for easily plugging in advanced third-party machine learning libraries. At the same time, many novice and advanced data scientists are leveraging the power of the H2O machine learning platform, a highly distributable and tunable machine learning library. The H2O platform provides the powerful MOJO concept (Model Object Optimized), making it easy to deploy trained models with a focus on scoring speed, traceability, exchangeability and backward compatibility. In this webinar, Edgar will introduce H2O Sparkling Water, the glue between Spark and the H2O ML platform, allowing users to seamlessly incorporate advanced data science libraries with their Spark environments. We will demonstrate creation of Spark pipelines integrating H2O ML models and their deployments using Scala or Python. We will use H2O’s AutoML algorithm for automatic model selection and ensembling and show how to load that into production-grade model into Spark pipeline for deployment.

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (63)
Subscribers (19205) is the maker of H2O, the world's best machine learning platform and Driverless AI, which automates machine learning. H2O is used by over 200,000 data scientists and more than 18,000 organizations globally. H2O Driverless AI does auto feature engineering and can achieve 40x speed-ups on GPUs.