InfoTechTarget and Informa Tech's Digital Businesses Combine.

Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.

Improve Big Data Performance on Google Dataproc: Best Practices

Presented by

Kirk Lewis, Pepperdata Field Engineer

About this talk

Managing your Google cloud costs and recapturing resource waste can be challenging. The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them. However, most auto scaling features aren’t granular enough to address today’s variable workload and application needs. Without granular and automatic control for your cloud instance resources you may be paying more than you should for your workloads. Autoscaling typically works well on clusters that process many jobs as well as single-job clusters. While some applications are constant and predictable, others are bound by CPU or memory, or are “spiky” in nature. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop, but with different approaches. Estimating the right number of cluster nodes for a workload can be difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive. Join Pepperdata Field Engineer Kirk Lewis for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud with a focus on Google Dataproc, what milestones to set, and best practices for managing a successful cloud framework.
Pepperdata

Pepperdata

6421 subscribers3 talks
Real-time, automated cloud cost optimization with no manual tuning
Pepperdata Capacity Optimizer delivers 30-47% greater cost savings for data-intensive workloads, eliminating the need for manual tuning by optimizing CPU and memory in real time with no application changes. Pepperdata pays for itself, immediately decreasing instance hours/waste, increasing utilization, and freeing developers from manual tuning to focus on innovation.
Related topics