Deep Learning on Apache® Spark™: Workflows and Best Practices

Logo
Presented by

Tim Hunter and Jules S. Damji

About this talk

The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark. Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including: * optimizing cluster setup; * configuring the cluster; * ingesting data; and * monitoring long-running jobs. We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters. Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (92)
Subscribers (39036)
No matter at what stage of your data journey you’re in, this channel will help you get a better understanding of the fundamental concepts of the Databricks Lakehouse platform and the problems we’re helping to solve for data teams.