Hi [[ session.user.profile.firstName ]]

Production Spark Webinar Series - Part 1: Best Practices for Spark in Production

Join us for our Part 1 of our Production Spark Webinar Series. This first installment gathers Spark experts and practitioners from varying backgrounds to discuss the top trends, challenges and use cases for production Spark applications. Our expert panel will discuss several key considerations when running Spark in production and take questions directly from the audience.

Our distinguished panel of industry experts is as follows:

Dr. Babak Behzad, Senior Software Engineer, SAP/Altiscale
Charles Boicey, Chief Innovation Officer, Clearsense
Richard Williamson, Principal Engineer, Silicon Valley Data Science
Andrew Ray, Principal Data Engineer, Silicon Valley Data Science
Sean Suchter, CTO and Co-Founder, Pepperdata
Recorded Mar 7 2017 59 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Chad Carson, Co-Founder and Ed Colonna, VP of Marketing
Presentation preview: Production Spark Webinar Series - Part 1: Best Practices for Spark in Production

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Diagnosing Application Failures and Errors Recorded: May 16 2018 40 mins
    Kirk Lewis
    This webinar will present the results of analyzing many Hadoop and Spark jobs on many multi-tenant production clusters. We will cover common issues seen, the symptoms of those issues, and how to address them. We will discuss the Pepperdata APM solution and discuss best practices for diagnosing application failures and errors.

    Pepperdata has gathered trillions of performance data points on production clusters running Hadoop and Spark, covering a variety of industries, applications, and workload types. We will present key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of applications.
  • Building a Big Data Stack on Kubernetes Recorded: May 2 2018 48 mins
    Sean Suchter
    There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark). Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, Sean will demonstrate:

    –The official Apache Spark 2.3 Kubernetes integration
    –How Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons.
    –How you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
  • Total Performance Management (TPM) for Hadoop and Spark Recorded: Apr 25 2018 23 mins
    Alex Pierce
    Pepperdata makes Hadoop+YARN based systems better by providing total performance management (TPM) for big data. Total performance management is the combination of application performance management (APM) and operations performance management (OPM) in a single package so developers and operators can rely on the same underlying information to build and operate highly performant big data applications in multi-tenant clusters.

    For developers, the Application Spotlight self-service APM portal surfaces applications that need attention from a performance perspective. For these applications, Application Spotlight provides precise recommendations to improve performance, automatically identifies bottlenecks and makes it easy to analyze root cause of errors and failures.

    For operators, the Cluster Analyzer OPM solution makes it easy to identify applications and users causing issues on the platform, proactively alert on those issues, and improve cluster performance. We also have roll up reports for things like chargeback and capacity planning. The Capacity Optimizer add-on module automatically increases cluster throughput 30-50% by addressing some of the inefficiencies of how YARN does resource management today.

    Join us for this webinar presented by Alex Pierce and learn how we can bring performance management to your applications and your cluster.
  • Doesn't YARN Already Do This? Recorded: Apr 11 2018 33 mins
    Kirk Lewis
    This is a webinar that discusses the limitations of manually tuning Hadoop and how Pepperdata Improves YARN and the ResourceManager.

    Pepperdata makes Hadoop+YARN based systems better by providing total performance management (TPM) for big data. Total performance management is the combination of application performance management (APM) and operations performance management (OPM) in a single package so developers and operators can rely on the same underlying information to build and operate highly performant big data applications in multi-tenant clusters.

    For developers, the Application Spotlight self-service APM portal surfaces applications that need attention from a performance perspective. For these applications, Application Spotlight provides precise recommendations to improve performance, automatically identifies bottlenecks and makes it easy to analyze root cause of errors and failures.

    For operators, the Cluster Analyzer OPM solution makes it easy to identify applications and users causing issues on the platform, proactively alert on those issues, and improve cluster performance. We also have roll up reports for things like chargeback and capacity planning. The Capacity Optimizer add-on module automatically increases cluster throughput 30-50% by addressing some of the inefficiencies of how YARN does resource management today.

    Pepperdata solutions are certified for use with Cloudera, Hortonworks, and MapR. Sometimes we are asked, Doesn’t YARN already do this? The answer is that Pepperdata does not replace YARN or the ResourceManager, but can significantly augment its capabilities. This webinar discusses the limitations of manually tuning Hadoop and how Pepperdata improves YARN and the ResourceManager.
  • Make Spark Apps Go Fast! Fix Failures & Bottlenecks with Application Spotlight Recorded: Apr 4 2018 43 mins
    Vinod Nair
    Pepperdata Application Spotlight analyzes all Hadoop and Spark jobs running on the cluster and provides developers with technical insights on how each job performed. Application Spotlight provides relevant application information, insights, and calls to action, all in one place, so that developers can easily and quickly perform these tasks.

    In addition to making jobs go faster, Application Spotlight helps developers to be better tenants in multi-tenant clusters by showing them how to write optimal jobs and more efficiently use their queue and cluster resources with practical, innovative application performance management solutions. Application Spotlight enables developers to quickly understand performance impacts and get recommendations on how to better optimize their jobs.”

    Intended for software engineers, developers, and technical leads who develop Spark applications, this webinar demonstrates how Application Spotlight helps developers quickly improve application performance, reduce resource usage, and understand application failures. Learn how developers can:

    – Generate application-specific recommendations to improve application performance
    – Highlight applications that need attention
    – Automatically identify bottlenecks, and alert on duration, failure conditions, and resource usage.
    – Search for any applications running on the cluster, compare current and previous runs, and visualize Spark applications and its stages for easy root cause failure analysis and performance tuning.

    Presenter:

    Vinod Nair leads product management at Pepperdata. He brings more than 20 years of experience in engineering and product management to the job, with a special interest in distributed systems and Hadoop. He has worked in software for telecommunications, financial management for small business, and big data. Vinod’s approach to product management is deeply influenced by his success in applying Lean Startup principles and rapid iteration to product design and development.
  • Three Ways That Operators Can Fix Slowdowns and Improve Cluster Performance Recorded: Mar 21 2018 58 mins
    Kirk Lewis
    Despite tremendous progress, there remain critically important areas, including multi-tenancy, performance optimization, and workflow monitoring where the DevOps team still requires management help. In this webinar, presenter Kirk lewis discusses the ways that big data clusters slow down, how to fix them, and how to keep them running at an optimal level. He also presents an overview of Pepperdata operation performance management (OPM) solutions. In this online webinar followed by a live Q and A, Field Engineer Kirk Lewis discusses:

    • How Pepperdata Cluster Analyzer helps operators overcome Hadoop and Spark performance limitations by monitoring all facets of cluster performance in real time, including CPU, RAM, disk I/O, and network usage by user, job, and task.

    • How Pepperdata Capacity Optimizer increases capacity utilization by 30-50% without adding new hardware

    • How Pepperdata adaptively and automatically tunes the cluster based on real-time resource utilization with performance improvement results that cannot be achieved through manual tuning.

    Presenter Bio

    Kirk Lewis joined Pepperdata in 2015. Previously, he was a Solutions Engineer at StackVelocity. Before that he was the lead technical architect for big data production platforms at American Express. Kirk has a strong background in big data.
  • HDFS High Availability (HA) on Kubernetes Recorded: Mar 14 2018 31 mins
    Kimoon Kim
    Part of our Kubernetes, Lessons Learned Series, HDFS High Availability (HA) on Kubernetes is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications and are interested in running Spark on Kubernetes while accessing HDFS data.

    Pepperdata has been exploring Kubernetes as potential Big Data platform with several other companies as part of a joint open source project. In this webinar, Kimoon Kim will show you how to:

    – Run Spark application natively on Kubernetes
    – Set up HDFS on Kubernetes in HA (High Availability) mode to ensure data durability

    Kimoon joined Pepperdata in 2013. Previously, he worked for the Google Search and Yahoo Search teams for many years. Kimoon has hands-on experience with large distributed systems processing massive data sets.
  • Spark Application Performance Management with Pepperdata Application Spotlight Recorded: Feb 21 2018 43 mins
    Vinod Nair
    Pepperdata Application Spotlight analyzes all Hadoop and Spark jobs running on the cluster and provides developers with technical insights on how each job performed. Intended for software engineers, developers, and technical leads who develop Spark applications, this webinar demonstrates how Application Spotlight helps developers quickly improve application performance, reduce resource usage, and understand application failures. Participate in this webinar and learn how developers can:

    –Identify the lines of code and the stages that cause performance issues related to CPU, memory, garbage collection, network, and disk I/O

    –Easily disambiguate resources used during parallel stages

    –Understand why run-time variations occur for the same application

    –Determine whether performance issues are due to the application or other workloads on the cluster

    –Receive actionable recommendations for tuning jobs

    –Validate tuning changes made to applications with a before and after comparison

    –View the highlights worst performing phases of jobs

    –Improve MapReduce and Spark developer productivity

    –Improve cluster efficiency based on clear recommendations on how to modify workloads and configurations


    Vinod Nair leads product management at Pepperdata. He brings more than 20 years of experience in engineering and product management to the job, with a special interest in distributed systems and Hadoop. He has worked in software for telecommunications, financial management for small business, and big data. Vinod’s approach to product management is deeply influenced by his success in applying Lean Startup principles and rapid iteration to product design and development.
  • Pepperdata Hadoop and Spark Performance Solutions for Dev and Ops Recorded: Feb 7 2018 58 mins
    Kirk Lewis
    Despite tremendous progress, there remain critically important areas, including multi-tenancy, performance optimization, and workflow monitoring where the DevOps team still needs management help. Pepperdata is the first company to integrate deep performance measurement and understanding into the DevOps process for Big Data applications. Pepperdata products enable developers to rapidly debug, optimize, and understand production applications while also enabling operators to diagnose and automatically solve performance problems in production multi-tenant clusters. Presented by Field Engineer Kirk Lewis, this webinar is an overview of Pepperdata products and services.

    In this online webinar followed by a live Q and A, Field Engineer Kirk Lewis will show you how to:

    • Reduce time to problem resolution using comprehensive and detailed performance data–Pepperdata Platform Spotlight helps operators overcome Hadoop and Spark performance limitations by monitoring all facets of cluster performance in real time, including CPU, RAM, disk I/O, and network usage by user, job, and task.

    • Increase capacity utilization by 30-50% without adding new hardware–Pepperdata adaptively and automatically tunes the cluster based on real-time resource utilization with performance improvement results that cannot be achieved through manual tuning.

    • Help developers understand and improve application performance–Pepperdata Application Spotlight enables developers to identify and fix application performance problems, excessive usage of resources, and application errors.
  • Building a Big Data Stack on Kubernetes Recorded: Jan 25 2018 51 mins
    Pepperdata Founder and CTO, Sean Suchter
    There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark).

    Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, Sean will demonstrate:

    –The official Apache Spark 2.3 Kubernetes integration
    –How Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons.
    –How you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
  • Fix Spark Failures and Bottlenecks Faster and Easier Recorded: Jan 17 2018 49 mins
    Vinod Nair
    Intended for software engineers, developers, and technical leads who develop Spark applications, this webinar discusses the results of analyzing many Spark jobs on many multi-tenant production clusters, the common issues seen, the symptoms of those issues, and how developers can address them. Pepperdata has gathered trillions of performance data points on production clusters running Spark, covering a variety of industries, applications, and workload types.

    Presenter Vinod Nair will talks about key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of Spark applications. In addition, Vinod will describe how we are turning these learnings into heuristics leveraged from the open source Dr. Elephant project.

    This webinar is followed by a live Q & A. A replay of this webinar will be available within 24 hours at https://www.pepperdata.com/resources/webinars/.
  • Pepperdata Application Summary Page Overview Recorded: Dec 19 2017 22 mins
    Alex Pierce
    Find any application easily with a simple new application search capability. Intended for software engineers, developers, operators, architects and technical leads who develop Spark applications, Pepperdata has simplified the task of application performance management. Pepperdata Field Engineer Alex Pierce demonstrates how to identify bottlenecks and get recommendations and insights to improve the performance of your application in one place.
  • Classifying Multi-Variate Time Series at Scale Recorded: Dec 7 2017 27 mins
    Ash Munshi
    Characterizing and understanding the runtime behavior of large-scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.

    Intended for machine learning researchers and developers who use machine learning in their applications, Pepperdata CEO Ash Munshi presents a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series.

    Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.
  • Building a Big Data Stack on Kubernetes Recorded: Dec 6 2017 55 mins
    Sean Suchter
    There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark).

    Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, it will show how Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also learn how you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
  • Fix Spark Failures and Bottlenecks Faster and Easier Recorded: Nov 16 2017 51 mins
    Vinod Nair
    Fix Spark Failures and Bottlenecks Faster and Easier is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications. Pepperdata has gathered trillions of performance data points on production clusters running Spark, covering a variety of industries, applications, and workload types.

    This presentation discusses the results of analyzing many Spark jobs on many multi-tenant production clusters. Pepperdata Field Engineer, Kirk Lewis will discuss common issues seen, the symptoms of those issues, and how developers can address them. This discussion includes key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of Spark applications. In addition, Kirk will describe how we are turning these learnings into heuristics used in the open source Dr. Elephant project.

    This webinar is followed by a live Q & A. A replay of this webinar will be available within 24 hours at https://www.pepperdata.com/resources/webinars/.
  • Effective High-Speed Multi-Tenant Data Lakes Recorded: Oct 25 2017 45 mins
    Sean Suchter, CTO and founder, Pepperdata
    Big Data has increased the demand for big data management solutions that operate at scale and meet business requirements. Big Data organizations realize quickly that scaling from small, pilot projects to large-scale production clusters involves a steep learning curve. Despite tremendous progress, critically important areas including multi-tenancy, performance optimization, and workflow monitoring remain areas where the operations team still needs management help.

    Intended for enterprises who already have a data lake or are setting up their first data lake, this presentation will discuss how to implement data lakes with operations tools that automatically optimize clusters with solutions for monitoring, performance tuning, and troubleshooting in production environments.

    Sean is the co-founder and CTO of Pepperdata. Previously, Sean was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.
  • Solving Performance Bottlenecks For Spark Developers Recorded: Oct 11 2017 47 mins
    Vinod Nair, Director of Product Management
    Intended for software engineers, developers, architects and technical leads who develop Spark applications, Vinod Nair will discuss how Pepperdata the product suite helps developers in Big Data Environments.
  • Strata Data Conference NYC –Pepperdata –HDFS on Kubernetes: Lessons Learned Recorded: Sep 28 2017 41 mins
    Kimoon Kim
    There is growing interest in running Spark natively on Kubernetes. Spark applications often access data in HDFS, and Spark supports HDFS locality by scheduling tasks on nodes that have the task input data on their local disks. Kimoon Kim demonstrates how to run HDFS inside Kubernetes to speed up Spark.
  • Top Considerations When Choosing a Big Data Management and Performance Solution Recorded: Sep 20 2017 54 mins
    Kirk Lewis
    The growing adoption of Hadoop and Spark has increased the demand for Big Data management solutions that operate at scale and meet business requirements. However, Big Data organizations realize quickly that scaling from small, pilot projects to large-scale production clusters involves a steep learning curve. Despite tremendous progress, there remain critically important area, including multi-tenancy, performance optimization, and workflow monitoing where the DevOps team still needs management help. In this webinar, field engineer Kirk Lewis discusses the top considerations when choosing a big data management and performance solution.
  • HDFS on Kubernetes: Lessons Learned Recorded: Sep 19 2017 46 mins
    Kimoon Kim, Pepperdata Software Engineer
    HDFS on Kubernetes: Lessons Learned is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications and are interested in running Spark on Kubernetes. Pepperdata has been exploring Kubernetes as potential Big Data platform with several other companies as part of a joint open source project.

    In this webinar, Kimoon Kim will show you how to: 

    –Run Spark application natively on Kubernetes
    –Enable Spark on Kubernetes read and write data securely on HDFS protected by Kerberos
DevOps for Big Data
Pepperdata is the DevOps for Big Data company. Leading Enterprise companies depend on Pepperdata to manage and improve the performance of Hadoop and Spark. Developers and operators use Pepperdata products and services to diagnose and solve performance problems in production and increase cluster utilization. The Pepperdata product suite improves communication of performance issues between Dev and Ops, shortens time to production, and increases cluster ROI. Pepperdata products and services work with customer Big Data systems both on-premise and in the cloud

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Production Spark Webinar Series - Part 1: Best Practices for Spark in Production
  • Live at: Mar 7 2017 7:00 pm
  • Presented by: Chad Carson, Co-Founder and Ed Colonna, VP of Marketing
  • From:
Your email has been sent.
or close