Spark is a powerful and robust open-source, general-purpose computation platform. It is an invaluable tool for users who want to munge, wrangle, clean and transform data before training a model. Spark Pipelines are also powerful constructs but have little support for easily plugging in advanced third-party machine learning libraries.
At the same time, many novice and advanced data scientists are leveraging the power of the H2O machine learning platform, a highly distributable and tunable machine learning library. The H2O platform provides the powerful MOJO concept (Model Object Optimized), making it easy to deploy trained models with a focus on scoring speed, traceability, exchangeability and backward compatibility.
In this webinar, Edgar will introduce H2O Sparkling Water, the glue between Spark and the H2O ML platform, allowing users to seamlessly incorporate advanced data science libraries with their Spark environments. We will demonstrate creation of Spark pipelines integrating H2O ML models and their deployments using Scala or Python. We will use H2O’s AutoML algorithm for automatic model selection and ensembling and show how to load that into production-grade model into Spark pipeline for deployment.