Five Mistakes to Avoid When Using Spark

Logo
Presented by

Alex Pierce, Pepperdata Field Engineer

About this talk

Learn how to avoid common mistakes managing Spark in a cluster environment and improve its usability Apache Spark is playing a critical role in the adoption and evolution of Big Data technologies because it provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. The increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to avoid common mistakes. Attendees can use these observations to improve the usability and supportability of Spark and avoid such issues in their projects. Topics include: – Serialization – Partition sizes – Executor resource sizing – DAG management – Shading
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (117)
Subscribers (6408)
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata solutions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.