Hi [[ session.user.profile.firstName ]]

When Cloud Costs Run Amok: Big Data Architect’s Worst Nightmare

When cloud costs start to spiral out of control, how do you make sense of it? How do you optimize your big data performance and manage runaway cloud costs? And finally, how do you ensure you’re meeting the needs of your cloud users? Not having the answers to these questions can be a nightmare.

As a big data architect, you’re responsible for quickly resolving complex issues, improving platform performance and efficiency, managing a smooth cloud migration, determining new technology adoptions, and reducing cloud costs.

Face and eliminate your worst big data nightmare with Pepperdata Field Engineer Kirk Lewis as he presents best practices for automatic optimization, cloud efficiency, and cost optimization. Topics include:

- Optimizing your modern big data architecture
- Delivering deep insight from across more data sources and types
- Managing the increased velocity of analytics requests, across the cloud and on premises
- Controlling costs and resources while supporting more users who want self-service capabilities
- Meeting the needs of data scientists and data analysts
Recorded Aug 3 2021 44 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Pepperdata Field Engineer, Kirk Lewis
Presentation preview: When Cloud Costs Run Amok: Big Data Architect’s Worst Nightmare

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Improve Big Data Performance on Dataproc: Best Practices Oct 26 2021 5:00 pm UTC 45 mins
    Kirk Lewis, Pepperdata Field Engineer
    Managing your Google cloud costs and recapturing resource waste can be challenging. The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them. However, most auto scaling features aren’t granular enough to address today’s variable workload and application needs. Without granular and automatic control for your cloud instance resources you may be paying more than you should for your workloads.

    Autoscaling typically works well on clusters that process many jobs as well as single-job clusters. While some applications are constant and predictable, others are bound by CPU or memory, or are “spiky” in nature. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop, but with different approaches.

    Estimating the right number of cluster nodes for a workload can be difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive.

    Join Pepperdata Field Engineer Kirk Lewis for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud with a focus on Google Dataproc, what milestones to set, and best practices for managing a successful cloud framework.
  • Getting Started with Kubernetes the Right Way Oct 19 2021 5:00 pm UTC 75 mins
    Nigel Poulton, Best-Selling Kubernetes Author
    Join Best-Selling Author Nigel Poulton to learn what Kubernetes is, why it's central to the future of cloud-native infrastructure and applications, and what it means to your career. Participate in the Q and A that follows the webinar for a chance to win a free copy of Quick Start Kubernetes.

    BIO
    Nigel is a tech-oholic with a passion for demystifying exciting technologies. He's written several best-selling books and video training courses, and he has helped well over 1 million people take their first steps with Docker and Kubernetes. When he's not playing with technology, he likes cars, scifi, and spending time with his family. He loves to connect with people, and you can reach him at nigelpoulton.com. He's also on most of the social media platforms at the "nigelpoulton" handle.

    Twitter: @nigelpoulton
  • Managing Big Data Analytics in the Cloud–Are You Ready? Oct 12 2021 5:00 pm UTC 25 mins
    Alex Pierce, Pepperdata Field Engineer
    Big data with cloud computing is a powerful combination that can transform your organization, process and analyze your big data faster, and improve your products and business with actionable insights. Bringing your big data cluster to the cloud presents huge opportunities, but there are some challenges that need to be overcome. Is your organization really ready for the complexity of managing big data analytics in the cloud?

    Most big data enterprises have either adopted cloud computing to improve IT operations and develop better software, faster, or they have an initiative to get there. Preparing for a successful move to the cloud is the difference between realizing the ROI the cloud promises, managing impatient stakeholders and SLAs, and possibly moving back to an on-premises solution.

    Creating an accurate cloud footprint requires good planning, a deep understanding of resource utilization, and granular data. In this webinar, we’ll discuss how to prepare and ensure that your organization has a solid plan to manage big data analytics in the cloud.

    Topics include:

    – Primary characteristics of big data and putting your data in the cloud
    – The challenges of managing big data performance in the cloud
    – FinOps (chargeback, analyzing wasted spend)
    – Planning for day 2
    – Achieving cloud performance
    – Observability, continuous tuning, and managed autoscaling
  • Spark Performance Tuning on Kubernetes Best Practices (Part 2) Sep 28 2021 5:00 pm UTC 45 mins
    Alex Pierce
    Running Spark on Kubernetes in a stable, performant, cost-efficient, and secure manner presents complex challenges. Spark performance tuning on Kubernetes ensures you get the best performance by optimizing system resources and tuning configurations.

    Join Pepperdata Field Engineer Alex Pierce as he discusses how to reduce the complexity of monitoring and managing Spark on Kubernetes with autonomous optimization and full-stack observability.

    Topics include:

    • Automation and observability for lowering costs and improving performance
    • Deploying, managing, monitoring, and simplifying Spark on Kubernetes: big data application monitoring, platform monitoring, and dynamic optimization
    • Configuring for performance and efficiency
    • Spark app-level dynamic allocation and cluster level autoscaling
    • What Spark on Kubernetes performance success looks like
  • Eliminate Waste and Lower Cloud Costs for GPU-Accelerated Big Data Applications Recorded: Sep 14 2021 16 mins
    Alex Pierce, Pepperdata Field Engineer
    Cloud GPUs are quickly becoming mainstream for big data applications like Spark on Kubernetes. Big data companies looking for scalability, speed, cost, as well as the energy and rack-space footprint of big data systems have turned their attention and budgets to GPUs. Although the massively parallel computing power of GPUs significantly speeds up these data-intensive ML and AI workloads, costs can spiral out of control.

    Join Pepperdata Field Engineer Alex Pierce for a webinar on gaining visibility into cloud GPU resource utilization at the application level and improving the performance of your GPU-accelerated big data applications.

    Topics include:

    - Why GPU-accelerated big data applications are going mainstream
    - Getting visibility into GPU memory usage and waste
    - Fine-tuning GPU usage through end-user recommendations
    - Manage costs at a granular level: attributing usage and cost to specific end-users
    - Monitoring and eliminating waste with GPU monitoring solutions
  • Big Data Cloud Automation: Navigating through the Noise of Recommendations Recorded: Sep 7 2021 45 mins
    Joel Stewart, Pepperdata Vice President of Customer Success
    The ability to manage and optimize big data cloud performance has significantly improved through the implementation of automation and observability. This includes the capacity to provide recommendations to improve performance. However, understanding, trusting, and utilizing recommendations to optimize and improve big data performance remains a significant challenge.

    When bad data or the wrong data gets fed into big data algorithms, bad results can occur, and bad decisions can get made. With the scale of data and a myriad of overlapping dependencies and moving pieces, it’s not that surprising that infrastructure and application recommendations intended to help you improve performance often only add to the noise and hamper insight.

    Join Pepperdata Vice President of Customer Success Joel Stewart as he discusses how to navigate through the noise of big data cloud performance recommendations and more efficiently manage big data performance.
  • Benchmarking in the Cloud: How Is Your Big Data Performance? Recorded: Aug 17 2021 21 mins
    Heidi Carson, Pepperdata Product Manager
    Are you getting the best price/performance from your big data cloud solution? Which cloud performance benchmarks do you use? Using cloud performance benchmarks and measuring your performance are key to understanding price/performance. This webinar will discuss the importance of understanding key benchmark metrics and how you can use benchmarking to improve performance. Learn the following:

    -Current big data benchmarking methods and trends
    -Preferred benchmarking datasets
    -Best practices for evaluating benchmarking reports

    Presenter Heidi Carson will also share recent performance benchmark results that demonstrate the benefits of big data cloud performance management solutions on top of Amazon EMR Custom Auto Scaling.
  • Optimize Spark Performance on Kubernetes - Part 1 Recorded: Aug 10 2021 15 mins
    Alex Pierce, Pepperdata Field Engineer
    Running Spark on Kubernetes is growing in popularity. Reasons for the growth are improved isolation, better resource sharing, and the ability to leverage homogeneous and cloud-native infrastructure for the entire stack.

    But running Spark on Kubernetes in a stable, performant, cost-efficient, and secure manner also presents specific challenges. In this webinar, Alex Pierce will talk about how to optimize Spark performance on Kubernetes. Topics include:

    – Making Spark-on-k8s reliable at scale
    – Core concepts and setup of Spark on Kubernetes
    – Configuration tips for performance and efficient resource sharing
    – Spark-app level dynamic allocation and cluster level autoscaling
    – Monitoring and security best practices
  • When Cloud Costs Run Amok: Big Data Architect’s Worst Nightmare Recorded: Aug 3 2021 44 mins
    Pepperdata Field Engineer, Kirk Lewis
    When cloud costs start to spiral out of control, how do you make sense of it? How do you optimize your big data performance and manage runaway cloud costs? And finally, how do you ensure you’re meeting the needs of your cloud users? Not having the answers to these questions can be a nightmare.

    As a big data architect, you’re responsible for quickly resolving complex issues, improving platform performance and efficiency, managing a smooth cloud migration, determining new technology adoptions, and reducing cloud costs.

    Face and eliminate your worst big data nightmare with Pepperdata Field Engineer Kirk Lewis as he presents best practices for automatic optimization, cloud efficiency, and cost optimization. Topics include:

    - Optimizing your modern big data architecture
    - Delivering deep insight from across more data sources and types
    - Managing the increased velocity of analytics requests, across the cloud and on premises
    - Controlling costs and resources while supporting more users who want self-service capabilities
    - Meeting the needs of data scientists and data analysts
  • Presto Performance Best Practices—Get Visibility Into Your Presto Queries Recorded: Jul 20 2021 23 mins
    Pepperdata Field Engineer, Alex Pierce
    Presto is the go-to query engine of many big data customers for interactive and reporting use cases due to its excellent performance and ability to join unstructured and structured data in seconds. Many Pepperdata customers get visibility into their Presto queries to explore data and run queries in one place with continuous, automated application and infrastructure tuning.

    The complexity of managing cluster performance to meet business requirements and performance SLAs is complex. You want best-in-class performance to meet the short deadlines of interactive workloads while reducing and/or controlling costs. If a single query fails to complete because of query-level inefficiencies, data skew, missing or old statistics, or resource configurations—that single resource-consuming query can negatively impact the entire application stack on that cluster.

    Join Pepperdata Field Engineer Alex Pierce for this webinar on Presto performance management best practices to learn:

    —When to use Presto versus other engines
    —Key criteria to look for in a query engine for interactive analytics
    —Visibility into all of your queries in one place with continuous, automated application and infrastructure tuning
    —How to enable self-service access to your data lake
    —How to Immediately improve and scale application performance through automated tuning
    —How to Improve query performance through job-specific recommendations, query run comparisons, and IT chargeback reports
  • Kafka Performance: Best Practices for Monitoring and Improving Recorded: Jul 13 2021 48 mins
    Kirk Lewis
    Kafka performance relies on implementing continuous intelligence and real-time analytics. It is important to be able to ingest, check the data, and make timely business decisions.

    Stream processing systems provide a unified, high-performance architecture. This architecture processes real-time data feeds and guarantees system health. But, performance and reliability are challenging. IT managers, system architects, and data engineers must address challenges with Kafka capacity planning to ensure the successful deployment, adoption, and performance of a real-time streaming platform. When something breaks, it can be difficult to restore service, or even know where to begin.

    This webinar discusses best practices to overcome critical performance challenges for Kafka data streaming that can negatively impact the usability, operation, and maintenance of the platform, as well as the data and devices connected to it. Topics include: Kafka data streaming architecture, key monitoring metrics, offline partitioning, broker, topics, consumer groups, and topic lag.
  • Drive Cloud Performance on Amazon EMR with Autoscaling Recorded: Jun 29 2021 24 mins
    Alex Pierce
    Autoscaling automatically increases or decreases the computational resources delivered to a cloud workload based on need. This typically means adding or reducing active servers (instances) that are leveraged against your workload within an infrastructure. The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them.

    Autoscaling enables applications to perform their best when demand changes, but depending on the application, performance varies. While some applications are constant and predictable, others are bound by CPU or memory, or “spiky” in nature. Autoscaling automatically addresses these variables to ensure optimal application performance. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop with a different approach.

    While autoscaling provides the elasticity that customers require for their big data workloads, it can lead to exorbitant runaway waste and cost and management complexity. Estimating the right number of cluster nodes for a workload is difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive.

    Join Pepperdata Field Engineer Alex Pierce for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud, what milestones to set, and recommendations on how to create a successful cloud migration framework. Learn the following:

    – Autoscaling types
    – Autoscaling strengths and weaknesses
    – When to use autoscaling and what autoscaling does well
    – Is traditional autoscaling limiting your success?
    – What is optimized cloud autoscaling?
    – What does cloud autoscaling success look like?
  • Big Data Self-Service Performance Analytics: Best Practices Recorded: Jun 15 2021 26 mins
    Kirk Lewis
    Big data self-service analytics is the solution to two critical issues: the proliferation of data and the subsequent shortage of data scientists to capture, manage, and analyze it all.

    To bridge the gaps and to take business analytics beyond what legacy reporting tools can do, many organizations are implementing self-service solutions that enable users to extract more value from ever-growing data volumes.

    When today’s cloud platforms are combined with modern big data performance solutions, data analysis power users can leverage self-service to gain business insights, optimize scaling, and create a unified interface to simplify analysis. Join us as we discuss the best practices for simplifying big data analytics while providing data analysts and scientists with self-service access on AWS cloud.

    Watch this webinar to:
    • Understand why more organizations are moving to the self-service analytics model.
    • Learn how to more easily create elastic Hadoop, Spark, and other big data clusters for dynamic, large-scale workloads.
    • Learn the best practices for cost optimization of big data workloads.
    • Understand how to evaluate big data SaaS criteria and determine whether “as-a-service” is right for your organization.
  • Fix Spark Performance Issues Without Thinking Too Hard Recorded: Jun 8 2021 27 mins
    Heidi Carson and Alex Pierce
    This discussion explores the results of analyzing thousands of Spark jobs on many multi-tenant production clusters. We will discuss common issues we have seen, the symptoms of those issues, and how you can address and overcome them without thinking too hard.

    Pepperdata big data performance management gathers trillions of performance data points on hundreds of production clusters running Spark, covering a variety of industries, applications, and workload types.

    Based on analyzing the behavior and performance of thousands of Spark applications and use case data from the Pepperdata Big Data Performance report, Heidi and Alex will discuss key performance insights. Topics include best and worst practices, gotchas, machine learning, and tuning recommendations.
  • Impala Performance Best Practices—Get Visibility into Hive and Impala Queries Recorded: Jun 1 2021 21 mins
    Alex Pierce
    Get insights into Impala performance best practices to get visibility into all of your Hive and Impala queries in one place with continuous, automated application and infrastructure tuning.

    Enterprises across all industries are heavily invested in big data infrastructure (Hadoop, Impala, Spark, Kafka, etc.) as they attempt to convert data into insights into business value. Data proliferates quickly and managing cluster performance to meet business requirements and performance SLAs can quickly become complex.

    For example, if a single query fails to complete because of query-level inefficiencies, data skew, missing or old statistics, or resource configurations—that single resource-consuming query can negatively impact the entire application stack on that cluster.

    Join Pepperdata Field Engineer, Alex Pierce for this webinar on Impala performance recommendations and learn how to:

    —Get visibility into all of your Hive and Impala queries in one place with continuous, automated application and infrastructure tuning.
    —Immediately improve and scale application performance through automated tuning.
    —Improve query performance through job-specific recommendations, query run comparisons and IT chargeback reports.
  • Leveraging Application Observability to Simplify Cloud Migration Recorded: May 18 2021 62 mins
    David Loshen, TDWI Analyst and Joel Stewart, Pepperdata Vice President of Customer Success
    The potential value of migrating to the cloud has inspired many organizations to transition their on-premises big data platforms to a cloud-based platform. Although it seems obvious to migrate big data workloads away from the on-premises data center to the cloud, simply moving your application does not necessarily ensure that your organization can immediately reap the benefits.

    When it comes to overseeing large complex systems supporting big data applications, the cloud presents a very different operational and management paradigm than a fully on-premises implementation. Even when engineers experienced with systems management relearn techniques for monitoring and troubleshooting cloud-based applications, it is clear there is no way to manually mitigate emerging risks.

    In this webinar, we consider key aspects of cloud system observability and management, including cost management, ensuring compliance with agreed service-level agreements (SLAs), and balancing cloud resource usage to optimize performance. Attendees will learn about:

    - The concept of observability for overseeing cloud-based big data applications
    - Efficiently and autonomously optimizing big data environments at scale
    - Automating performance analysis to help identify performance issues
    - Determining where system complexity introduces congestion and bottlenecks that impact observing SLAs
    - Understanding cloud service use and optimizing cloud service costs
  • Optimize Spark Performance on Kubernetes Recorded: Apr 20 2021 15 mins
    Alex Pierce, Pepperdata Field Engineer
    Running Spark on Kubernetes is growing in popularity. Reasons for the growth are improved isolation, better resource sharing, and the ability to leverage homogeneous and cloud-native infrastructure for the entire stack.

    But running Spark on Kubernetes in a stable, performant, cost-efficient, and secure manner also presents specific challenges. In this webinar, Alex Pierce will talk about how to optimize Spark performance on Kubernetes. Topics include:

    – Making Spark-on-k8s reliable at scale
    – Core concepts and setup of Spark on Kubernetes
    – Configuration tips for performance and efficient resource sharing
    – Spark-app level dynamic allocation and cluster level autoscaling
    – Monitoring and security best practices
  • Simplify Kubernetes Performance Management with End-to-End Visibility Recorded: Apr 13 2021 39 mins
    Kirk Lewis, Pepperdata Field Engineer
    Complex applications running on Kubernetes scale super fast, but this can create visibility gaps that can make detecting and troubleshooting Kubernetes issues as difficult as finding a needle in a haystack. Although Docker and Kubernetes are now becoming standard components when building and orchestrating applications, you’re still responsible for managing the performance of applications built atop this new stack.

    With many companies prioritizing containers for more applications and more uses, an increasing area of concern for everyone in IT is finding a way to monitor, manage, and optimize performance across these sprawling environments. Join this webinar to learn:

    – A brief history of current trends in computing, cloud, containerization, and Kubernetes
    – Challenges: virtualization, distributed applications, and multi-cloud
    – How to meet the demands of new microservices apps while maintaining legacy apps
    – How to deploy, manage, monitor, and simplify: big data analytics monitoring, platform monitoring, and dynamic optimization
    – Ways to reduce the complexity of monitoring and managing Kubernetes with automated full-stack observability
    – What Kubernetes performance management success looks like
  • Proven Approaches to Hive Query Tuning Recorded: Apr 6 2021 45 mins
    Kirk Lewis, Pepperdata Field Engineer
    Apache Hive is a powerful tool frequently used to analyze data while handling ad-hoc queries and regular ETL workloads. Despite being one of the more mature solutions in the Hadoop ecosystem, developers, data scientists and IT operators are still unable to avoid common inefficiencies when running Hive at scale. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources. Poorly tuned platforms or poorly sized queues can cause even efficient queries to suffer.

    This webinar discusses proven approaches to Hive query tuning that improve query speed and reduce cost. Learn how to understand the detailed performance characteristics of query workloads and the infrastructure-wide issues that impact these workloads.

    Pepperdata Field Engineer, Kirk Lewis will discuss:

    - Finding problem queries - Pinpointing delayed queries, expensive queries, and queries that waste CPU and memory
    - Improving query utilization and performance with database and infrastructure metrics
    - Ensuring your infrastructure is not adversely impacting query performance
  • Managing Big Data Analytics in the Cloud–Are You Ready? Recorded: Mar 16 2021 25 mins
    Alex Pierce, Pepperdata Field Engineer
    Big data with cloud computing is a powerful combination that can transform your organization, process and analyze your big data faster, and improve your products and business with actionable insights. Bringing your big data cluster to the cloud presents huge opportunities, but there are some challenges that need to be overcome. Is your organization really ready for the complexity of managing big data analytics in the cloud?

    Most big data enterprises have either adopted cloud computing to improve IT operations and develop better software, faster, or they have an initiative to get there. Preparing for a successful move to the cloud is the difference between realizing the ROI the cloud promises, managing impatient stakeholders and SLAs, and possibly moving back to an on-premises solution.

    Creating an accurate cloud footprint requires good planning, a deep understanding of resource utilization, and granular data. In this webinar, we’ll discuss how to prepare and ensure that your organization has a solid plan to manage big data analytics in the cloud.

    Topics include:

    – Primary characteristics of big data and putting your data in the cloud
    – The challenges of managing big data performance in the cloud
    – FinOps (chargeback, analyzing wasted spend)
    – Planning for day 2
    – Achieving cloud performance
    – Observability, continuous tuning, and managed autoscaling
Performance Management for Big Data
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata soluions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: When Cloud Costs Run Amok: Big Data Architect’s Worst Nightmare
  • Live at: Aug 3 2021 5:00 pm
  • Presented by: Pepperdata Field Engineer, Kirk Lewis
  • From:
Your email has been sent.
or close