Apache Hive is a powerful tool frequently used to analyze data while handling ad-hoc queries and regular ETL workloads. Despite being one of the more mature solutions in the Hadoop ecosystem, developers, data scientists and IT operators are still unable to avoid common inefficiencies when running Hive at scale. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources. Poorly tuned platforms or poorly sized queues can cause even efficient queries to suffer.
This webinar discusses proven approaches to Hive query tuning that improve query speed and reduce cost. Learn how to understand the detailed performance characteristics of query workloads and the infrastructure-wide issues that impact these workloads.
Pepperdata Field Engineer, Kirk Lewis will discuss:
- Finding problem queries - Pinpointing delayed queries, expensive queries, and queries that waste CPU and memory
- Improving query utilization and performance with database and infrastructure metrics
- Ensuring your infrastructure is not adversely impacting query performance