Hi [[ session.user.profile.firstName ]]

Pepperdata

  • Date
  • Rating
  • Views
  • Fix Spark Failures and Bottlenecks Faster and Easier
    Fix Spark Failures and Bottlenecks Faster and Easier Vinod Nair Recorded: Jan 17 2018 49 mins
    Intended for software engineers, developers, and technical leads who develop Spark applications, this webinar discusses the results of analyzing many Spark jobs on many multi-tenant production clusters, the common issues seen, the symptoms of those issues, and how developers can address them. Pepperdata has gathered trillions of performance data points on production clusters running Spark, covering a variety of industries, applications, and workload types.

    Presenter Vinod Nair will talks about key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of Spark applications. In addition, Vinod will describe how we are turning these learnings into heuristics leveraged from the open source Dr. Elephant project.

    This webinar is followed by a live Q & A. A replay of this webinar will be available within 24 hours at https://www.pepperdata.com/resources/webinars/.
  • Pepperdata Application Summary Page Overview
    Pepperdata Application Summary Page Overview Alex Pierce Recorded: Dec 19 2017 22 mins
    Find any application easily with a simple new application search capability. Intended for software engineers, developers, operators, architects and technical leads who develop Spark applications, Pepperdata has simplified the task of application performance management. Pepperdata Field Engineer Alex Pierce demonstrates how to identify bottlenecks and get recommendations and insights to improve the performance of your application in one place.
  • Classifying Multi-Variate Time Series at Scale
    Classifying Multi-Variate Time Series at Scale Ash Munshi Recorded: Dec 7 2017 27 mins
    Characterizing and understanding the runtime behavior of large-scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.

    Intended for machine learning researchers and developers who use machine learning in their applications, Pepperdata CEO Ash Munshi presents a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series.

    Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.
  • Building a Big Data Stack on Kubernetes
    Building a Big Data Stack on Kubernetes Sean Suchter Recorded: Dec 6 2017 55 mins
    There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark).

    Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, it will show how Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also learn how you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
  • Fix Spark Failures and Bottlenecks Faster and Easier
    Fix Spark Failures and Bottlenecks Faster and Easier Vinod Nair Recorded: Nov 16 2017 51 mins
    Fix Spark Failures and Bottlenecks Faster and Easier is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications. Pepperdata has gathered trillions of performance data points on production clusters running Spark, covering a variety of industries, applications, and workload types.

    This presentation discusses the results of analyzing many Spark jobs on many multi-tenant production clusters. Pepperdata Field Engineer, Kirk Lewis will discuss common issues seen, the symptoms of those issues, and how developers can address them. This discussion includes key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of Spark applications. In addition, Kirk will describe how we are turning these learnings into heuristics used in the open source Dr. Elephant project.

    This webinar is followed by a live Q & A. A replay of this webinar will be available within 24 hours at https://www.pepperdata.com/resources/webinars/.
  • Effective High-Speed Multi-Tenant Data Lakes
    Effective High-Speed Multi-Tenant Data Lakes Sean Suchter, CTO and founder, Pepperdata Recorded: Oct 25 2017 45 mins
    Big Data has increased the demand for big data management solutions that operate at scale and meet business requirements. Big Data organizations realize quickly that scaling from small, pilot projects to large-scale production clusters involves a steep learning curve. Despite tremendous progress, critically important areas including multi-tenancy, performance optimization, and workflow monitoring remain areas where the operations team still needs management help.

    Intended for enterprises who already have a data lake or are setting up their first data lake, this presentation will discuss how to implement data lakes with operations tools that automatically optimize clusters with solutions for monitoring, performance tuning, and troubleshooting in production environments.

    Sean is the co-founder and CTO of Pepperdata. Previously, Sean was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.
  • Solving Performance Bottlenecks For Spark Developers
    Solving Performance Bottlenecks For Spark Developers Vinod Nair, Director of Product Management Recorded: Oct 11 2017 47 mins
    Intended for software engineers, developers, architects and technical leads who develop Spark applications, Vinod Nair will discuss how Pepperdata the product suite helps developers in Big Data Environments.
  • Strata Data Conference NYC –Pepperdata –HDFS on Kubernetes: Lessons Learned
    Strata Data Conference NYC –Pepperdata –HDFS on Kubernetes: Lessons Learned Kimoon Kim Recorded: Sep 28 2017 41 mins
    There is growing interest in running Spark natively on Kubernetes. Spark applications often access data in HDFS, and Spark supports HDFS locality by scheduling tasks on nodes that have the task input data on their local disks. Kimoon Kim demonstrates how to run HDFS inside Kubernetes to speed up Spark.
  • Top Considerations When Choosing a Big Data Management and Performance Solution
    Top Considerations When Choosing a Big Data Management and Performance Solution Kirk Lewis Recorded: Sep 20 2017 54 mins
    The growing adoption of Hadoop and Spark has increased the demand for Big Data management solutions that operate at scale and meet business requirements. However, Big Data organizations realize quickly that scaling from small, pilot projects to large-scale production clusters involves a steep learning curve. Despite tremendous progress, there remain critically important area, including multi-tenancy, performance optimization, and workflow monitoing where the DevOps team still needs management help. In this webinar, field engineer Kirk Lewis discusses the top considerations when choosing a big data management and performance solution.
  • HDFS on Kubernetes: Lessons Learned
    HDFS on Kubernetes: Lessons Learned Kimoon Kim, Pepperdata Software Engineer Recorded: Sep 19 2017 46 mins
    HDFS on Kubernetes: Lessons Learned is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications and are interested in running Spark on Kubernetes. Pepperdata has been exploring Kubernetes as potential Big Data platform with several other companies as part of a joint open source project.

    In this webinar, Kimoon Kim will show you how to: 

    –Run Spark application natively on Kubernetes
    –Enable Spark on Kubernetes read and write data securely on HDFS protected by Kerberos

Embed in website or blog