Packaging, Deploying, and Running Apache Spark Applications in Production

Presented by

Saba El-Hilo, Data Engineer, Mapbox

About this talk

Apache Spark has proven to be indispensable due to its endless applications and use cases. Developers, data scientists, engineers, and analysts alike can benefit from its power. However, deterministically managing dependencies, packaging, testing, scheduling, and deploying a Spark application can be challenging. As organizations grow, these individuals become dispersed across multiple teams and departments. This makes a team-specific solution no longer applicable. So, what type of tooling do you need to allow these individuals to solely focus on writing a Spark application? And more importantly, how do you enforce development best practices such as unit testing, continuous integration, version control, and deployment environments? The data engineering team at Mapbox has developed tooling and infrastructure to address these challenges and enable individuals across the organization to build and deploy Spark applications. This talk will walk you through our solution to packaging, deploying, scheduling, and running Spark applications in production. We will also address some of the problems we’ve faced and how the adoption process is evolving across the team.

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (118)
Subscribers (8249)
Tune in to hear from open data lake platform leaders and engineers discuss everything from continuous date engineering on data lakes for machine learning, streaming analytics, ad-hoc analytics and data exploration in the cloud. The interactive talks are designed for both data engineers, data analysts and data scientists that want to learn about some of the challenges and solutions for use cases seen in data-driven organizations. Learn more about Qubole: