Memory management is at the heart of any data-intensive system. Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.
No matter at what stage of your data journey you’re in, this channel will help you get a better understanding of the fundamental concepts of the Databricks Lakehouse platform and the problems we’re helping to solve for data teams.…