There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark).
Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, Sean will demonstrate:
–The official Apache Spark 2.3 Kubernetes integration
–How Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons.
–How you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.