Enhancing Spark with H2O's Random Grid Search and AutoML using Sparkling Water

Presented by

Jakub Háva, Team Lead and Senior Software Engineer, H2O.ai

About this talk

Learn more about how you can integrate large scale data preprocessing with Machine Learning using Sparkling Water. Sparkling Water enables training H2O-3 models leveraging Apache Spark clusters in a distributed manner. It also allows for using trained H2O-3 and Driverless AI models inside Apache Spark. We will demonstrate model training together with hyper-parameter tuning (Cartesian and Random GridSearch with time constraint) of various algorithms, using AutoML – training meta model combining different algorithms, hyper-parameter search and stacking (Ensemble method) all using Spark Pipeline API. We will also demonstrate how target encoding can be used with the Sparkling Water API. What will users learn: - How to use H2O's GridSearch in Sparkling Water environment - How to use AutoML in Sparkling Water environment - How to put the trained models into production
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (41)
Subscribers (19208)
H2O.ai is the maker of H2O, the world's best machine learning platform and Driverless AI, which automates machine learning. H2O is used by over 200,000 data scientists and more than 18,000 organizations globally. H2O Driverless AI does auto feature engineering and can achieve 40x speed-ups on GPUs.