How to Overcome the Five Most Common Spark Challenges

Logo
Presented by

Alex Pierce, Pepperdata Field Engineer

About this talk

Review Guidelines on How to Conquer the Most Common Spark Problems You are Likely to Encounter Apache Spark is a full-fledged, data engineering toolkit that enables you to operate on large data sets without worrying about the underlying infrastructure. Spark is known for its speed, which is a result of improved implementation of MapReduce that focuses on keeping data in memory instead of persisting data on disk. However, in addition to its great benefits, Spark has its issues including complex deployment and scaling. How best to deal with these and other challenges and maximize the value you are getting from Spark? Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to overcome the most common Spark problems you are likely to encounter. Alex will also accompany his presentation with demonstrations and examples. Attendees can use this information to improve the usability and supportability of Spark in their projects and successfully overcome common challenges. During this webinar, attendees will learn about: – Serialization and its role in Spark performance – Partition recommendations and sizing – Executor resource sizing and heap utilization – Driver-side vs. executor-side processing: reducing idle executor time – Using shading to manage library conflicts
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (117)
Subscribers (6414)
Pepperdata Capacity Optimizer delivers 30-47% greater cost savings for data-intensive workloads, eliminating the need for manual tuning by optimizing CPU and memory in real time with no application changes. Pepperdata pays for itself, immediately decreasing instance hours/waste, increasing utilization, and freeing developers from manual tuning to focus on innovation.