Different Strategies of Scaling H2O Machine Learning on Apache Spark

Presented by

Jakub Hava, Software Engineer at H2O.ai

About this talk

Sparkling Water integrates H2O, open source distributed machine learning platform, with the capabilities of Apache Spark. It allows users to leverage H2O’s machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O’s Flow GUI which makes Sparkling Water a great enterprise solution. Sparkling Water 2.0 was built to coincide with the release of Apache Spark 2.0 and introduces several new features. One of the latest and largest features is the ability to configure Sparkling Water for different workloads, scale and optimize the platform according to your data and needs. In this talk we will introduce the basic architecture of Sparkling Water, go over different scaling strategies and explain the pros and cons of each solution. We will also compare the approaches with regards to the specific use cases and provide the rationale why or why not each solution may be a good fit for the desired use case. This talk will finish with a live demo demonstrating the mentioned approaches and should give you a real time experience of configuring and running Sparkling Water for your use case(s)!

Related topics:

More from this channel

Upcoming talks (8)
On-demand talks (597)
Subscribers (89073)
Data is the foundation of any organization and therefore, it is paramount that it is managed and maintained as a valuable resource. Subscribe to this channel to learn best practices and emerging trends in a variety of topics including data governance, analysis, quality management, warehousing, business intelligence, ERP, CRM, big data and more.