Improve Big Data Performance on Google Dataproc: Best Practices

Logo
Presented by

Kirk Lewis, Pepperdata Field Engineer

About this talk

Managing your Google cloud costs and recapturing resource waste can be challenging. The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them. However, most auto scaling features aren’t granular enough to address today’s variable workload and application needs. Without granular and automatic control for your cloud instance resources you may be paying more than you should for your workloads. Autoscaling typically works well on clusters that process many jobs as well as single-job clusters. While some applications are constant and predictable, others are bound by CPU or memory, or are “spiky” in nature. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop, but with different approaches. Estimating the right number of cluster nodes for a workload can be difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive. Join Pepperdata Field Engineer Kirk Lewis for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud with a focus on Google Dataproc, what milestones to set, and best practices for managing a successful cloud framework.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (117)
Subscribers (6408)
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata solutions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.