Presented by

Heidi Carson and Alex Pierce

About this talk

This discussion explores the results of analyzing thousands of Spark jobs on many multi-tenant production clusters. We will discuss common issues we have seen, the symptoms of those issues, and how you can address and overcome them without thinking too hard. Pepperdata big data performance management gathers trillions of performance data points on hundreds of production clusters running Spark, covering a variety of industries, applications, and workload types. Based on analyzing the behavior and performance of thousands of Spark applications and use case data from the Pepperdata Big Data Performance report, Heidi and Alex will discuss key performance insights. Topics include best and worst practices, gotchas, machine learning, and tuning recommendations.