Hi [[ session.user.profile.firstName ]]

Proven Approaches to Hive Query Tuning

Apache Hive is a powerful tool frequently used to analyze data while handling ad-hoc queries and regular ETL workloads. Despite being one of the more mature solutions in the Hadoop ecosystem, developers, data scientists and IT operators are still unable to avoid common inefficiencies when running Hive at scale. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources. Poorly tuned platforms or poorly sized queues can cause even efficient queries to suffer.

This webinar discusses proven approaches to Hive query tuning that improve query speed and reduce cost. Learn how to understand the detailed performance characteristics of query workloads and the infrastructure-wide issues that impact these workloads.

Pepperdata Field Engineer, Kirk Lewis will discuss:

- Finding problem queries - Pinpointing delayed queries, expensive queries, and queries that waste CPU and memory
- Improving query utilization and performance with database and infrastructure metrics
- Ensuring your infrastructure is not adversely impacting query performance
Recorded Jun 9 2020 46 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Kirk Lewis, Pepperdata Field Engineer
Presentation preview: Proven Approaches to Hive Query Tuning

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Big Data Self-Service Performance Analytics: Best Practices May 11 2021 5:00 pm UTC 45 mins
    Kirk Lewis
    Big data Self-service analytics is the solution to two critical issues that big data organizations are experiencing: the proliferation of data and the subsequent shortage of data scientists to capture, manage, and analyze it all.

    To bridge the gaps and to take business analytics beyond what legacy reporting tools can do, many organizations are implementing self-service solutions that enable users to extract more value from ever-growing data volumes.

    When today’s cloud platforms are combined with modern big data performance solutions, data analysis power users can leverage self-service to quickly gain business insights, optimize scaling, and create a unified interface to simplify analysis. Join us as we discuss the best practices for simplifying big data analytics while providing data analysts and scientists with self-service access on AWS cloud.

    Watch this webinar to:

    • Understand why more organizations are moving to the self-service analytics model.
    • Learn how to more easily create elastic Hadoop, Spark, and other big data clusters for dynamic, large-scale workloads.
    • Learn the best practices for cost optimization of big data workloads.
    • Understand how to evaluate big data SaaS criteria and determine whether “as-a-service” is right for your big data.
    • Learn the best practices for implementing big data self-service analytics.
  • Optimize Spark Performance on Kubernetes Apr 20 2021 5:00 pm UTC 45 mins
    Alex Pierce, Pepperdata Field Engineer
    Running Spark on Kubernetes is growing in popularity. Reasons for the growth are improved isolation, better resource sharing, and the ability to leverage homogeneous and cloud-native infrastructure for the entire stack.

    But running Spark on Kubernetes in a stable, performant, cost-efficient, and secure manner also presents specific challenges. In this webinar, Alex Pierce will talk about how to optimize Spark performance on Kubernetes. Topics include:

    – Making Spark-on-k8s reliable at scale
    – Core concepts and setup of Spark on Kubernetes
    – Configuration tips for performance and efficient resource sharing
    – Spark-app level dynamic allocation and cluster level autoscaling
    – Monitoring and security best practices
  • Simplify Kubernetes Performance Management with End-to-End Visibility Apr 13 2021 5:00 pm UTC 45 mins
    Kirk Lewis, Pepperdata Field Engineer
    Complex applications running on Kubernetes scale super fast, but this can create visibility gaps that can make detecting and troubleshooting Kubernetes issues as difficult as finding a needle in a haystack. Although Docker and Kubernetes are now becoming standard components when building and orchestrating applications, you’re still responsible for managing the performance of applications built atop this new stack.

    With many companies prioritizing containers for more applications and more uses, an increasing area of concern for everyone in IT is finding a way to monitor, manage, and optimize performance across these sprawling environments. Join this webinar to learn:

    – A brief history of current trends in computing, cloud, containerization, and Kubernetes
    – Challenges: virtualization, distributed applications, and multi-cloud
    – How to meet the demands of new microservices apps while maintaining legacy apps
    – How to deploy, manage, monitor, and simplify: big data analytics monitoring, platform monitoring, and dynamic optimization
    – Ways to reduce the complexity of monitoring and managing Kubernetes with automated full-stack observability
    – What Kubernetes performance management success looks like
  • Proven Approaches to Hive Query Tuning Recorded: Apr 6 2021 45 mins
    Kirk Lewis, Pepperdata Field Engineer
    Apache Hive is a powerful tool frequently used to analyze data while handling ad-hoc queries and regular ETL workloads. Despite being one of the more mature solutions in the Hadoop ecosystem, developers, data scientists and IT operators are still unable to avoid common inefficiencies when running Hive at scale. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources. Poorly tuned platforms or poorly sized queues can cause even efficient queries to suffer.

    This webinar discusses proven approaches to Hive query tuning that improve query speed and reduce cost. Learn how to understand the detailed performance characteristics of query workloads and the infrastructure-wide issues that impact these workloads.

    Pepperdata Field Engineer, Kirk Lewis will discuss:

    - Finding problem queries - Pinpointing delayed queries, expensive queries, and queries that waste CPU and memory
    - Improving query utilization and performance with database and infrastructure metrics
    - Ensuring your infrastructure is not adversely impacting query performance
  • Managing Big Data Analytics in the Cloud–Are You Ready? Recorded: Mar 16 2021 25 mins
    Alex Pierce, Pepperdata Field Engineer
    Big data with cloud computing is a powerful combination that can transform your organization, process and analyze your big data faster, and improve your products and business with actionable insights. Bringing your big data cluster to the cloud presents huge opportunities, but there are some challenges that need to be overcome. Is your organization really ready for the complexity of managing big data analytics in the cloud?

    Most big data enterprises have either adopted cloud computing to improve IT operations and develop better software, faster, or they have an initiative to get there. Preparing for a successful move to the cloud is the difference between realizing the ROI the cloud promises, managing impatient stakeholders and SLAs, and possibly moving back to an on-premises solution.

    Creating an accurate cloud footprint requires good planning, a deep understanding of resource utilization, and granular data. In this webinar, we’ll discuss how to prepare and ensure that your organization has a solid plan to manage big data analytics in the cloud.

    Topics include:

    – Primary characteristics of big data and putting your data in the cloud
    – The challenges of managing big data performance in the cloud
    – FinOps (chargeback, analyzing wasted spend)
    – Planning for day 2
    – Achieving cloud performance
    – Observability, continuous tuning, and managed autoscaling
  • The Future of Big Data: A Perspective from IT Leaders Transforming IT Ops Recorded: Mar 2 2021 38 mins
    Ahmed Kamran Imadi, Fortune 100 Finserv, Mark Kidwell, Autodesk, Satish Nekkalapudi, Magnite, Joel Stewart, Pepperdata
    Have you changed the way you use big data in your business? Understanding the rapid pace of data usage across your organization and planning for the future of big data is a key skill. Sometimes we all need a little insight.

    During this webinar, hear from industry leaders, Ahmed Kamran Imadi, Big Data Solutions Engineering at Fortune 100 Financial Institution, Mark Kidwell, Chief Data Architect at Autodesk, Satish Nekkalapudi, Sr. Manager at Magnite, and VP of Customer Success Joel Stewart at Pepperdata about what role big data is playing in their business today and how they are adapting their IT ops and development teams to keep pace with change.

    Topics include:
    What will be big data’s role in the future for business and how will IT adapt and grow?
    How will the growth in big data affect IT ops and developer processes today?
    Will this change skill sets for these roles?
    What skills will be needed in IT as the need for big data increases?
  • Big Data Cloud Performance Monitoring: Best Practices Recorded: Feb 23 2021 37 mins
    Kirk Lewis, Pepperdata Field Engineer
    Monitoring performance in the cloud with unified visibility across the entire ecosystem is critical to the success of any cloud deployment.

    In both the data center and cloud deployments, a proliferation of diverse performance solutions and microservice applications across infrastructures and networks can severely complicate cluster performance management. Hidden network dependencies and the complexity of managing many different solutions can negatively impact the application experience for both internal and external users.

    In this webinar, learn how you can gain visibility into cloud deployments, accelerate your cloud adoption, streamline IT operations, and deliver great customer experiences.

    By watching, you’ll learn how to:

    - Implement unified visibility.
    - Monitor the health of all your hosts, containers, and apps within one place.
    - Drill down into the stack, any app, anywhere, at scale.
    - Surface issues with infrastructure observability.
    - Reduce waste and cost.
    - Enhance operational efficiency.
    - Improve application reliability to collaborate across multiple cloud providers and stakeholders.
  • Controlling Cost and Complexity in the Cloud with Managed Autoscaling Recorded: Feb 16 2021 22 mins
    Alex Pierce, Pepperdata Field Engineer
    The promise of autoscaling is that workloads receive exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them.

    Autoscaling enables applications to perform their best when demand changes, but depending on the application, performance varies. While some applications are constant and predictable, others are bound by CPU or memory, or “spiky” in nature. Autoscaling automatically addresses these variables to ensure optimal application performance. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop with a different approach.

    Estimating the right number of cluster nodes for a workload is difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive.

    Join Pepperdata Field Engineer Kirk Lewis for this discussion about operational challenges associated with maintaining optimal big data performance in the cloud, what milestones to set, and recommendations on how to create a successful cloud migration framework. Learn the following:
    – What are the types of autoscaling?
    – What does autoscaling do well?
    – When should you use autoscaling?
    – Does traditional autoscaling limit your success?
    – What is optimized cloud autoscaling?
  • Big Data Observability – What Is It and How Do I Get It? Recorded: Feb 9 2021 21 mins
    Heidi Carson, Pepperdata PM
    Observability is an extremely popular topic these days. What's driving this interest? Why is observability needed? What is the difference between observability and monitoring?

    When IT Ops knows there is a problem, but they can't pinpoint or quickly get to the root cause, traditional monitoring approaches are not enough anymore. Achieving observability requires carefully correlating many different sources from logs, metrics, and traces. And this can present additional challenges in distributed environments that use containers and micro-services.

    In this webinar, you’ll get the answers to these questions:

    - Why is observability essential in distributed big data environments?
    - What are the critical challenges of the multi-cloud and containerized world?
    - How can analytics stack performance solutions help you move from monitoring to observability?
  • How to Implement Cloud Observability Like a Pro Recorded: Jan 26 2021 24 mins
    Heidi Carson, Pepperdata Product Manager and Kirk Lewis, Pepperdata Field Engineer
    Do traditional on-prem observability techniques translate to the cloud? Many big data enterprises lack observability and thus struggle to manage and understand unprecedented amounts of data in the cloud. A monitoring solution may alert to a problem, but it can’t pinpoint the issue or quickly get to the root cause.

    Observability, by contrast, tells you why you have a problem and often provides a recommendation on how to quickly resolve it. Combined with ML and automation, observability delivers actionable answers to optimize cloud-native applications while also improving overall cluster performance. Observability is particularly challenging in cloud environments, where the old, manual, cluster-by-cluster approach may be insufficient and error-prone.

    In this webinar, you will learn three key techniques for achieving big data observability in the cloud.
  • Where is Big Data Going in 2021? Recorded: Jan 19 2021 42 mins
    Kirk Lewis, Pepperdata Field Engineer
    As corporate big data leaders look to improve data quality, turnaround some of their big data projects in 2021, and optimize and improve application and cluster performance to meet business objectives, big data and analytics remain essential resources for companies to survive in a highly competitive big data environment.

    As you help your organization plan for the future and prepare for where big data is going in 2021, join presenter, Pepperdata Field Engineer, Kirk Lewis for this webinar where he will discuss the following:

    - How cloud technology will make big data more accessible
    - How cloud data will shape customer experiences
    - Kubernetes
    - Simplicity (one tool for each job)
    - Complexity (several tools)
    - Cost control (managing data and cloud sprawl)
  • Kafka Performance: Best Practices for Monitoring and Improving Recorded: Jan 12 2021 47 mins
    Kirk Lewis
    Kafka performance relies on implementing continuous intelligence and real-time analytics. It is important to be able to ingest, check the data, and make timely business decisions.

    Stream processing systems provide a unified, high-performance architecture. This architecture processes real-time data feeds and guarantees system health. But, performance and reliability are challenging. IT managers, system architects, and data engineers must address challenges with Kafka capacity planning to ensure the successful deployment, adoption, and performance of a real-time streaming platform. When something breaks, it can be difficult to restore service, or even know where to begin.

    This webinar discusses best practices to overcome critical performance challenges for Kafka data streaming that can negatively impact the usability, operation, and maintenance of the platform, as well as the data and devices connected to it. Topics include: Kafka data streaming architecture, key monitoring metrics, offline partitioning, broker, topics, consumer groups, and topic lag.
  • Best Practices for Spark Performance Management Recorded: Dec 22 2020 28 mins
    Alex Pierce, Field Engineer at Pepperdata
    Gain the knowledge of Spark veteran Alex Pierce on how to manage the challenges of maintaining the performance and usability of your Spark jobs.

    Apache Spark provides sophisticated ways for enterprises to leverage big data compared to Hadoop. However, the increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.

    This webinar draws on experiences across dozens of production deployments and explores the best practices for managing Apache Spark performance. Learn how to avoid common mistakes, improve the usability, supportability and performance of Spark.

    Topics include:

    – Serialization
    – Partition sizes
    – Executor resource sizing
    – DAG management
  • What Does a CTO Do When a 60PB Hadoop Cluster Devours the IT Budget? Recorded: Dec 15 2020 36 mins
    Hitachi Vantara Sr Director of Product Marketing, Chuck Yarbrough and Pepperdata Field Engineer, Alex Pierce
    In 2019, the CTO of a large global bank realized a problem: Their data continued to grow, costs for their Hadoop cluster rapidly escalated, and these costs started eating into their annual IT budget. Moving off of Hadoop, or “lift-and-shift” was out of the question. They needed a way to cap their cost and growth without impacting their ability to remain market competitive.

    Learn how you can expose, simplify, and solve the problems created by large big data clusters. Save time and money, and ensure compliance.
  • How to Save Even More with Qubole Recorded: Nov 17 2020 13 mins
    Alex Pierce
    Cloud providers make managing big data look easy, but autoscaling is wasteful and inefficient. Qubole takes advantage of the separation between compute and storage to help their customers reduce their spend in the cloud. However, Qubole customers can use cloud computing resources even more efficiently, only pay for what they use, and avoid over-provisioning servers and virtual machines with managed autoscaling.

    In this webinar, presenter Alex Pierce will use customer examples to demonstrate how Qubole customers can automatically improve infrastructure utilization and gain more throughput with Pepperdata big data performance management solutions.
  • The Future of Big Data: A Perspective from IT Leaders Transforming IT Ops Recorded: Nov 10 2020 39 mins
    Ahmed Kamran Imadi, Fortune 100 Finserv, Mark Kidwell, Autodesk, Satish Nekkalapudi, Magnite, Joel Stewart, Pepperdata
    Have you changed the way you use big data in your business? Understanding the rapid pace of data usage across your organization and planning for the future of big data is a key skill. Sometimes we all need a little insight.

    During this webinar, hear from industry leaders, Ahmed Kamran Imadi, Big Data Solutions Engineering at Fortune 100 Financial Institution, Mark Kidwell, Chief Data Architect at Autodesk, Satish Nekkalapudi, Sr. Manager at Magnite, and VP of Customer Success Joel Stewart at Pepperdata about what role big data is playing in their business today and how they are adapting their IT ops and development teams to keep pace with change.

    Topics include:
    What will be big data’s role in the future for business and how will IT adapt and grow?
    How will the growth in big data affect IT ops and developer processes today?
    Will this change skill sets for these roles?
    What skills will be needed in IT as the need for big data increases?
  • Autoscaling Big Data Operations in the Cloud Recorded: Oct 27 2020 29 mins
    Kirk Lewis
    The ability to scale the number of nodes in your cluster up and down on the fly is among the major features that make cloud deployments attractive. Estimating the right number of cluster nodes for a workload is difficult; user-initiated cluster scaling requires manual intervention, and mistakes are often costly and disruptive.

    Autoscaling enables applications to perform their best when demand changes. But the definition of performance varies, depending on the app. Some are CPU-bound, others memory-bound. Some are “spiky” in nature, while others are constant and predictable. Autoscaling automatically addresses these variables to ensure optimal application performance. Amazon EMR, Azure HDInsight, and Google Cloud Dataproc all provide autoscaling for big data and Hadoop, but each takes a different approach.

    Pepperdata field engineer, Kirk Lewis will discuss the operational challenges associated with maintaining optimal big data performance, what milestones to set, and offer recommendations on how to create a successful cloud migration framework. Topics include:

    – Types of scaling
    – What does autoscaling do well? When should you use it?
    – Does traditional autoscaling limit your success?
    – What is optimized cloud autoscaling?
  • Fix Spark Performance Issues Without Thinking Too Hard Recorded: Oct 13 2020 27 mins
    Heidi Carson and Alex Pierce
    This discussion explores the results of analyzing thousands of Spark jobs on many multi-tenant production clusters. We will discuss common issues we have seen, the symptoms of those issues, and how you can address and overcome them without thinking too hard.

    Pepperdata big data performance management gathers trillions of performance data points on hundreds of production clusters running Spark, covering a variety of industries, applications, and workload types.

    Based on analyzing the behavior and performance of thousands of Spark applications and use case data from the Pepperdata Big Data Performance report, Heidi and Alex will discuss key performance insights. Topics include best and worst practices, gotchas, machine learning, and tuning recommendations.
  • Kafka Performance: Best Practices for Monitoring and Improving Recorded: Sep 29 2020 48 mins
    Kirk Lewis
    Kafka performance relies on implementing continuous intelligence and real-time analytics. It is important to be able to ingest, check the data, and make timely business decisions.

    Stream processing systems provide a unified, high-performance architecture. This architecture processes real-time data feeds and guarantees system health. But, performance and reliability are challenging. IT managers, system architects, and data engineers must address challenges with Kafka capacity planning to ensure the successful deployment, adoption, and performance of a real-time streaming platform. When something breaks, it can be difficult to restore service, or even know where to begin.

    This webinar discusses best practices to overcome critical performance challenges for Kafka data streaming that can negatively impact the usability, operation, and maintenance of the platform, as well as the data and devices connected to it. Topics include: Kafka data streaming architecture, key monitoring metrics, offline partitioning, broker, topics, consumer groups, and topic lag.
  • Top Considerations When Choosing a Big Data Performance Management Solution Recorded: Sep 15 2020 22 mins
    Alex Pierce
    Growing adoption of Hadoop and Spark has increased demand for Big Data and Performance Management solutions that operate at scale. However, enterprise organizations quickly realize that scaling from pilot projects to large-scale production clusters involves a steep learning curve. Despite progress, DevOps teams still struggle with multi-tenancy, cluster performance, and workflow monitoring. This webinar discusses the top considerations when choosing a big data performance management solution.

    In this webinar, field engineer Alex Pierce discusses the key things to consider when choosing a big data performance management solution. Learn how to:

    – Maximize your infrastructure investment
    – Achieve up to 50 percent increase in throughput, and run more jobs on existing infrastructure
    – Ensure cluster stability and efficiency
    – Avoid overspending on unnecessary hardware
    – Spend less time in backlog queues

    Learn how to automatically tune and optimize your cluster resources, and recapture wasted capacity. Alex will walk through use case examples to demonstrate the types of results you can expect to achieve in your own big data environment.
Performance Management for Big Data
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata soluions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Proven Approaches to Hive Query Tuning
  • Live at: Jun 9 2020 5:00 pm
  • Presented by: Kirk Lewis, Pepperdata Field Engineer
  • From:
Your email has been sent.
or close