Hi [[ session.user.profile.firstName ]]

Capacity Manager Q and A – How to Improve Productivity, Throughput, and Uptime

There are numerous challenges to leveraging your big data infrastructure for optimal performance. This webinar answers operational management questions around optimizing performance and maximizing capacity, such as “Who’s blowing up our cluster?”, “How can I run more applications?” and more. You will learn from our expert, based on real-world deployments, how a complete APM solution delivers:

– Improved throughput, uptime, efficiency and performance.
– Accurate capacity planning.
– Deploy capacity accurately for predictable performance.
– Recapture wasted resources to maximize current infrastructure.


Kirk joined Pepperdata in 2015. Previously, he was a Solutions Engineer at StackVelocity. Before that he was the lead technical architect for big data production platforms at American Express. Kirk has a strong background in big data.
Recorded Oct 10 2018 34 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Kirk Lewis, Field Engineer
Presentation preview: Capacity Manager Q and A – How to Improve Productivity, Throughput, and Uptime

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Monitor and Improve Kafka Performance Sep 15 2020 5:00 pm UTC 45 mins
    Kirk Lewis
    Kafka performance relies on implementing continuous intelligence and real-time big data analytics. With a massive surge of data volume, understanding how to quickly ingest, evaluate the data, and make timely business decisions is extremely important to achieving success.

    While stream processing systems provide a unified, high-performance architecture for processing real-time data feeds, guaranteeing system health, performance, and reliability is challenging. IT managers, system architects, and data engineers must address challenges including Kafka capacity planning to ensure the successful deployment, adoption, and performance of a real-time streaming platform. When something breaks, it can be difficult to restore service quickly, or even know where to begin.

    Pepperdata Streaming Spotlight shows you near real-time Kafka performance metrics so you can quickly find and resolve bottlenecks, slowdowns, and failures on the Kafka platform. This webinar discusses how to overcome critical performance challenges for Kafka data streaming that can negatively impact the usability, operation, and maintenance of the platform, as well as the data and devices connected to it.
  • Spark Recommendations – Optimize Application Performance and Build Expertise Aug 25 2020 5:00 pm UTC 45 mins
    Pepperdata Product Manager Heidi Carson, Pepperdata Field Engineer, Alex Pierce
    Does your big data analytics platform provide you with the Spark recommendations you need to optimize your application performance and improve your own skillset? Explore how you can use Spark recommendations to untangle the complexity of your Spark applications, reduce waste and cost, and enhance your own knowledge of Spark best practices.

    Topics include:

    - Avoiding contention by ensuring your Spark applications are requesting
    the appropriate amount of resources,
    - Preventing memory errors,
    - Configuring Spark applications for optimal performance,
    - Real-world examples of impactful recommendations,
    - and More!

    Join Product Manager Heidi Carson and Field Engineer Alex Pierce from Pepperdata to gain real-world experience with a variety of Spark recommendations, and participate in the Q and A that follows.
  • Reduce the Runaway Waste and Cost of Autoscaling Aug 11 2020 5:00 pm UTC 45 mins
    Kirk Lewis
    Autoscaling is the process of automatically increasing or decreasing the computational resources delivered to a cloud workload based on need. This typically means adding or reducing active servers (instances) that are leveraged against your workload within an infrastructure. The promise of autoscaling is that workloads should get exactly the cloud computational resources they require at any given time, and you only pay for the server resources you need, when you need them. Autoscaling provides the elasticity that customers require for their big data workloads, but it can also lead to exorbitant runaway waste and cost.

    Pepperdata provides automated deployment options that can be seamlessly added to your Amazon EMR, Google Dataproc, and Qubole environments to recapture waste and reduce cost. Join us for this webinar where we will discuss how DevOps can use managed autoscaling to be even more efficient in the cloud. Topics include:

    – Types of scaling
    – What does autoscaling do well? When should you be using it?
    – Is traditional autoscaling limiting your big data success?
    – What is missing? Why is this problem important?
    – Managed cloud autoscaling with Pepperdata Capacity Optimizer
  • IT Cost Optimization with Big Data Analytics Performance Management Recorded: Jul 28 2020 34 mins
    Alex Pierce, Pepperdata Field Engineer
    Big data analytics performance management is a competitive differentiator and a priority for data-driven companies. However, optimizing IT costs while guaranteeing performance and reliability in distributed systems is difficult. The complexity of distributed systems makes it critically important to have unified visibility into the entire stack. This webinar discusses how to maximize the business value of your big data analytics stack investment and achieve ROI while reducing expenses. Learn how to:

    - Correlate visibility across big data applications and infrastructure for a complete and transparent view of performance and cost.
    - Continuously tune your platform, and run up to 50% more jobs on Hadoop clusters.
    - Optimally utilize resources, and ensure customer satisfaction.
    - Simplify troubleshooting and problem resolution while resolving issues to meet SLAs.

    In this webinar, learn specific ways to automatically tune and optimize big data cluster resources, recapture wasted capacity, and improve ROI for your big data analytics stack.
  • Big Data Observability - What Is It and How Do I Get It? Recorded: Jul 14 2020 21 mins
    Heidi Carson, Pepperdata PM
    Observability is an extremely popular topic these days. What's driving this interest? Why is observability needed? What is the difference between observability and monitoring?

    When IT Ops knows there is a problem, but they can't pinpoint or quickly get to the root cause, traditional monitoring approaches are not enough anymore. Achieving observability requires carefully correlating many different sources from logs, metrics, and traces. And this can present additional challenges in distributed environments that use containers and micro-services.

    In this webinar, you’ll get the answers to these questions:

    - Why is observability essential in distributed big data environments?
    - What are the critical challenges of the multi-cloud and containerized world?
    - How can analytics stack performance solutions help you move from monitoring to observability?
  • Best Practices for Spark Performance Management Recorded: Jun 23 2020 29 mins
    Alex Pierce, Field Engineer at Pepperdata
    Gain the knowledge of Spark veteran, Alex Pierce on how to manage the challenges of maintaining the performance and usability of your Spark jobs

    Apache Spark provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. However, the increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.

    This webinar draws on experiences across dozens of production deployments and explores the best practices for managing Apache Spark performance. Learn how to avoid common mistakes, improve the usability, supportability and performance of Spark.

    Topics include:

    – Serialization
    – Partition sizes
    – Executor resource sizing
    – DAG management
  • Proven Approaches to Hive Query Tuning Recorded: Jun 9 2020 46 mins
    Kirk Lewis, Pepperdata Field Engineer
    Apache Hive is a powerful tool frequently used to analyze data while handling ad-hoc queries and regular ETL workloads. Despite being one of the more mature solutions in the Hadoop ecosystem, developers, data scientists and IT operators are still unable to avoid common inefficiencies when running Hive at scale. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources. Poorly tuned platforms or poorly sized queues can cause even efficient queries to suffer.

    This webinar discusses proven approaches to Hive query tuning that improve query speed and reduce cost. Learn how to understand the detailed performance characteristics of query workloads and the infrastructure-wide issues that impact these workloads.

    Pepperdata Field Engineer, Kirk Lewis will discuss:

    - Finding problem queries - Pinpointing delayed queries, expensive queries, and queries that waste CPU and memory
    - Improving query utilization and performance with database and infrastructure metrics
    - Ensuring your infrastructure is not adversely impacting query performance
  • 5 Kafka Best Practices Recorded: May 26 2020 32 mins
    Alex Pierce
    Learn five ways to improve your Kafka operations’ readiness and platform performance through proven Kafka best practices.

    The influx of data from a wide variety of sources is already straining your big data IT infrastructure. On top of that, data must be ingested, processed, and made available in near real-time to support business-critical use cases. Kafka data streaming is used today by 30% of Fortune 500 companies because of its ability to feed data in real-time into a predictive analytics engine in support of these use cases. However, there are critical challenges and limitations.

    By following the latest Kafka best practices, you can more easily and effectively manage Kafka. Join us for a webinar where we will discuss five specific ways to help keep your Kafka deployment optimized and more easily managed.

    Best practices covered:

    -Monitoring key component states to understand Kafka cluster health
    -Measuring crucial metrics to understand Kafka cluster performance
    -Observing critical building blocks in the Kafka hardware stack
    -Tracking important metrics for Kafka capacity planning
    -Knowing what to alert on and what can be monitored passively
  • 4 Ways to Improve Your Big Data Analytics Stack ROI During COVID-19 Recorded: May 12 2020 36 mins
    Kirk Lewis
    Supply chain and logistic challenges due to the global COVID-19 outbreak are making it difficult for companies to address their growing big data capacity needs and purchase and provision more servers as needed.

    Many organizations are addressing these issues by expediting the use of cloud services, but this can get costly if the infrastructure is not optimized. A better solution is to improve performance and get more out of your existing infrastructure.

    Even the most experienced IT operations teams and capacity planners can’t manually tune every application and workflow.

    The scale—thousands of applications per day and a growth rate of dozens of nodes per year—is too large for manual efforts.

    There’s a better way: Automatic capacity optimization eliminates manual tuning and allows you to run 30-50% more jobs on your existing Hadoop or Spark clusters.

    This webinar discusses four specific ways to automatically tune and optimize cluster resources, recapture wasted capacity, and improve your big data analytics stack ROI—on-premises or in the cloud.
  • Cloud Migration: Planning for Day 2 Recorded: Apr 29 2020 52 mins
    451 Research Data Analyst, James Curtis, Pepperdata Customer VP, Joel Stewart
    You and your organization just survived migrating to the cloud. A successful Day 1 is accomplished. But what about cloud migration Day 2? Are you prepared for life in the cloud? Your stakeholders and SLAs aren’t going to wait until things settle down. Planning for success after a cloud migration can mean the difference between seeing the ROI that cloud promises or having to consider moving back to an on-premises solution.

    Creating an accurate cloud footprint requires good planning, a deep understanding of resource utilization, and granular data. Register for this webinar and learn how you can avoid the pitfalls of your cloud migration Day 2 and continue to make data a driving force in your business.
  • Do You Have the Visibility You Need for Kafka Data Streaming? Recorded: Apr 15 2020 33 mins
    Kirk Lewis - Field Engineer
    Learn how to create a durable, low latency message system by getting improved clarity into your Kafka data streaming pipeline.

    In this webinar, learn how to:

    - forecast Kafka data streaming capacity needs to protect throughput performance
    - correlate infrastructure and application metrics across Kafka, Spark, Hive, Impala, HBase, and more
    - automatically detect and alert on atypical Kafka scenarios to prevent data loss
    - ensure preservation of SLAs for real-time stream processing applications

    Kirk Lewis will cover the challenges around monitoring Kafka data streaming analytics and how Pepperdata can help. Pepperdata enables customers to integrate Kafka metrics into a big data analytics dashboard and get detailed visibility into Kafka cluster metrics, broker health, topics, partitions, and the rate of data coming in and going out.
  • Eliminating Hive Query Chaos Recorded: Feb 25 2020 20 mins
    Alex Pierce - Field Engineer
    Get Deep Insight into Query Execution and Database Performance by Using Pepperdata Query Spotlight

    Enterprise customers report that Hive queries are a significant portion of their analytics workloads, and the performance of these workloads is critical to their big data success. Inefficient queries can mean missed SLAs, negative impact on other users, and slow database resources.

    In this webinar, Field Engineer Alex Pierce divulges how to get the 360° query view that you need as well as how to overcome the key issues customers face with queries in their deployments. Topics include:

    - Simplifying root cause analysis with visibility into delayed queries, most expensive queries, and queries that are wasting CPU and memory.
    - Improving query utilization and performance with database and infrastructure metrics.
    - Resolving problems faster through improved visibility and immediate feedback through real-time metrics.

    Alex will demonstrate the new Pepperdata query performance management solution: Pepperdata Query Spotlight. Query Spotlight makes it easy to understand the detailed performance characteristics of query workloads, together with infrastructure-wide issues that impact these workloads. With this new functionality, operators and developers can tune query workloads, debug, and optimize for better performance and reduced costs, both in the cloud and on-premises.
  • Take the Guesswork out of Migrating to the Cloud Recorded: Jan 15 2020 49 mins
    Panel: Ash Munshi, Alex Pierce, Peter Cnudde
    Learn How You Can Migrate to the Cloud and Reduce the Management Costs of a Hybrid Data Center

    In the early days of cloud migration, it was all upside: Operating a data center in the cloud was always cheaper than dedicated on-premises servers. Fast-forward a few years, IT Operations is in a visibility crisis and many big data teams cannot understand what they are spending or why.

    Ultimately, in the quest to control and understand cloud spend, analytics are critically important. Without powerful, in-depth insights, big data teams simply don’t have the information they need to do their job.

    Please join Pepperdata CEO, Ash Munshi; Peter Cnudde, former VP of Engineering of Yahoo's Big Data and Machine Learning platforms; Pepperdata Field Engineer, Alex Pierce, for a roundtable Q and A discussion on how to take the guesswork out of migrating to the cloud, and reduce the runaway management costs of a hybrid data center.
  • Stop Manually Tuning and Start Getting ROI From Your Big Data Infrastructure Recorded: Dec 4 2019 26 mins
    Pepperdata Field Engineer, Eric Lotter
    Would your big data organization benefit from automatic capacity optimization that eliminates manual tuning and enables you to run 30-50% more jobs on your Hadoop clusters?

    As analytics platforms grow in scale and complexity on-prem and in the cloud, managing and maintaining efficiency is a critical challenge, and money is wasted.

    In this webinar, Pepperdata Field Engineer Eric Lotter discusses how your organization can:

    – Maximize your infrastructure investment
    – Achieve up to 50 percent increase in throughput and run more jobs on existing infrastructure
    – Ensure cluster stability and efficiency
    – Avoid overspending on unnecessary hardware
    – Spend less time in backlog queues

    On a typical cluster, hundreds and even thousands of decisions are made per second, increasing typical enterprise cluster throughput up to 50 percent. Even the most experienced operator dedicated to resource management can’t make manual configuration changes with the required precision and speed. Learn how to automatically tune and optimize your cluster resources, and recapture wasted capacity. Eric will provide relevant use case examples and the results achieved to show you how to get more out of your infrastructure investment.
  • Introduction to Platform Spotlight - Big Data Analytics Performance Management Recorded: Nov 20 2019 8 mins
    Kirk Lewis, Pepperdata Field Engineer
    You’re constantly looking at different tools to understand the performance of your clusters, to manage and monitor resource capacity, maximize your existing infrastructure investment, and forecast resource needs, but it’s impossible to have an accurate view without access to the right data.

    Your operators are challenged with configuring and sizing critical resources running on multi-tenant clusters with mixed workloads, and you receive alerts without enough detail to isolate and resolve problems. Improving the performance of your clusters and successfully managing capacity requires an understanding of dozens of performance metrics and tuning parameters.

    Pepperdata Platform Spotlight continuously collects extensive unique data—that nobody else collects—about your hosts, queues, users, applications and all relevant resources, providing you with a 360° cluster view to quickly diagnose performance issues and make resource decisions.
  • How to Overcome the Five Most Common Spark Challenges Recorded: Nov 19 2019 33 mins
    Alex Pierce, Pepperdata Field Engineer
    Review Guidelines on How to Conquer the Most Common Spark Problems You are Likely to Encounter

    Apache Spark is a full-fledged, data engineering toolkit that enables you to operate on large data sets without worrying about the underlying infrastructure. Spark is known for its speed, which is a result of improved implementation of MapReduce that focuses on keeping data in memory instead of persisting data on disk. However, in addition to its great benefits, Spark has its issues including complex deployment and scaling. How best to deal with these and other challenges and maximize the value you are getting from Spark?

    Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to overcome the most common Spark problems you are likely to encounter. Alex will also accompany his presentation with demonstrations and examples. Attendees can use this information to improve the usability and supportability of Spark in their projects and successfully overcome common challenges. During this webinar, attendees will learn about:

    – Serialization and its role in Spark performance
    – Partition recommendations and sizing
    – Executor resource sizing and heap utilization
    – Driver-side vs. executor-side processing: reducing idle executor time
    – Using shading to manage library conflicts
  • Capacity Planning for Big Data Hadoop Environments Recorded: Jul 29 2019 19 mins
    Kirk Lewis, Pepperdata Field Engineer
    Learn about Hadoop capacity planning at the cluster, queue, and application levels with Pepperdata

    As the data analytics field matures, the amount of data generated is growing rapidly and so is its use by enterprise organizations. This increase in data improves data analytics and the result is a continuous circle of data and information generation. To manage these new volumes of data, IT organizations and DevOps teams must understand resource usage and right-size their Hadoop clusters to balance the OPEX and CAPEX.

    This presentation discusses capacity planning for big data Hadoop environments. Pepperdata field engineer Kirk Lewis explores big data Hadoop capacity planning at the cluster level, the queue level, and the application level via the Pepperdata big data performance management UI.
  • Optimizing the Performance of Your Critical Big Data Applications Recorded: Jun 27 2019 32 mins
    Bob Williams and Ryan Clark, Pepperdata
    Learn why your big data workloads and applications may be running slow and how to attain faster MTTR

    Webinar Date: Thursday June 27
    Time: 8:00 AM Eastern / 5:00 AM Pacific
    Duration: 30 minutes

    Optimizing the Performance of Your Critical Big Data Applications

    Moving workloads and Hadoop and Spark applications to the cloud is either a reality or a near-term goal for an overwhelming number of enterprises. For most organizations, optimizing cloud use to improve operational efficiency and achieve cost savings is a primary objective. But migration of workloads is takes time, during which an organization must manage application performance both on-premises and in the cloud while maintaining a close watch on ROI.

    This webinar addresses key questions for organizations deploying big data workloads and applications:
    - Why is my application running slow / stopped?
    - How can I achieve faster MTTR and reduce resource requirements?
    - How can I save up to 50% on infrastructure spend and still achieve SLAs?
    - How can I automatically correlate application and infrastructure performance metrics to get the “big picture”?
    - How accurate are my cloud migration and long-term deployment cost estimates?

    Join the Pepperdata performance optimization team to learn more...
  • Seven Steps to a Successful AWS Cloud Migration Recorded: May 22 2019 31 mins
    Ashrith Mekala, Head of Engineering at Cloudwick
    Ashrith Mekala, Head of Engineering at Cloudwick, shares his experiences of guiding enterprises through the cloud migration process

    Cloud migration is more about processes than data. Even seemingly simple tasks like file distribution can require complex migration steps to ensure that the resulting cloud infrastructure matches the desired workflow. Most cloud benefits, from cost savings to scalability, are justifiable. But a proven methodology, a complete understanding of the risks, careful planning and flawless execution are necessary to realize those returns.

    Join presenter Ashrith Mekala, Head of Engineering at Cloudwick, as he shares his experience as a big data solutions architect who has successfully guided dozens of enterprises through the AWS cloud migration process. Attendees can apply these learnings to refine their own processes, avoid the risks, and optimize the benefits of current and planned cloud migrations.

    Topics include:

    – Migration models - forklift, hybrid, native
    – Framework - data migration, data validation and app integration
    – Methodology - including pre-migration state and cloud cost assessment using Pepperdata
    – GAP analysis and project planning
    – Moving from pilot to production
    – Key transition tasks and on-going support
  • Five Mistakes to Avoid When Using Spark Recorded: May 15 2019 32 mins
    Alex Pierce, Pepperdata Field Engineer
    Learn how to avoid common mistakes managing Spark in a cluster environment and improve its usability

    Apache Spark is playing a critical role in the adoption and evolution of Big Data technologies because it provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. The increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.

    Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to avoid common mistakes. Attendees can use these observations to improve the usability and supportability of Spark and avoid such issues in their projects.

    Topics include:

    – Serialization
    – Partition sizes
    – Executor resource sizing
    – DAG management
    – Shading
Performance Management for Big Data
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata soluions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Capacity Manager Q and A – How to Improve Productivity, Throughput, and Uptime
  • Live at: Oct 10 2018 6:00 pm
  • Presented by: Kirk Lewis, Field Engineer
  • From:
Your email has been sent.
or close