Managing your Google cloud costs and recapturing resource waste can be challenging. The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them. However, most auto scaling features aren’t granular enough to address today’s variable workload and application needs. Without granular and automatic control for your cloud instance resources you may be paying more than you should for your workloads.
Autoscaling typically works well on clusters that process many jobs as well as single-job clusters. While some applications are constant and predictable, others are bound by CPU or memory, or are “spiky” in nature. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop, but with different approaches.
Estimating the right number of cluster nodes for a workload can be difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive.
Join Pepperdata Field Engineer Kirk Lewis for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud with a focus on Google Dataproc, what milestones to set, and best practices for managing a successful cloud framework.