Building a Big Data Stack on Kubernetes

Logo
Presented by

Sean Suchter

About this talk

There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark). Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, it will show how Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also learn how you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (117)
Subscribers (6408)
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata solutions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.