Hi [[ session.user.profile.firstName ]]


  • Date
  • Rating
  • Views
  • Production Spark Series Part 2: Connecting Your Code to Spark Internals
    Production Spark Series Part 2: Connecting Your Code to Spark Internals Sean Suchter, CTO/Co-Founder, Pepperdata Recorded: May 9 2017 39 mins
    Spark is a dynamic execution engine that can take relatively simple Scala code and create complex and optimized execution plans. In this talk, we will describe how user code translates into Spark drivers, executors, stages, tasks, transformations, and shuffles. We will describe how this is critical to the design of Spark and how this tight interplay allows very efficient execution. Users and operators who are aware of the concepts will become more effective at their interactions with Spark.
  • Big Data for Big Data: Machine Learning Models of Hadoop Cluster Behavior
    Big Data for Big Data: Machine Learning Models of Hadoop Cluster Behavior Sean Suchter, CTO/Co-Founder, Pepperdata and Shekhar Gupta, Software Engineer, Pepperdata Recorded: Apr 10 2017 37 mins
    Learn how to use machine learning to improve cluster performance.

    This talk describes the use of very fine-grained performance data from many Hadoop clusters to build a model predicting excessive swapping events.

    Performance of batch processing systems such as YARN is generally determined by the throughput, which measures the amount of workload (tasks) completed in a given time window. For a given cluster size, the throughput can be increased by running as much workload as possible on each host, to utilize all the free resources available on host. Because each node is running a complex combination of different tasks/containers, the performance characteristics of the cluster are dynamically changing. As a result, there is always a danger of overutilizing host memory, which can result into extreme swapping or thrashing. The impacts of thrashing can be very severe; it can actually reduce the throughput instead of increasing it.

    By using very fine-grained (5 second) data from many production clusters running very different workloads, we have trained a generalized model that very rapidly detects the onset of thrashing, within seconds from the first symptom. This detection has proven fast enough to enable effective mitigation of the negative symptom of thrashing, allowing the hosts to continuously provide high throughput.

    To build this system we used hand-labeling of bad events combined with large scale data processing using Hadoop, HBase, Spark, and iPython for experimentation. We will discuss the methods used as well as the novel findings about Big Data cluster performance.
  • Production Spark Webinar Series - Part 1: Best Practices for Spark in Production
    Production Spark Webinar Series - Part 1: Best Practices for Spark in Production Chad Carson, Co-Founder and Ed Colonna, VP of Marketing Recorded: Mar 7 2017 59 mins
    Join us for our Part 1 of our Production Spark Webinar Series. This first installment gathers Spark experts and practitioners from varying backgrounds to discuss the top trends, challenges and use cases for production Spark applications. Our expert panel will discuss several key considerations when running Spark in production and take questions directly from the audience.

    Our distinguished panel of industry experts is as follows:

    Dr. Babak Behzad, Senior Software Engineer, SAP/Altiscale
    Charles Boicey, Chief Innovation Officer, Clearsense
    Richard Williamson, Principal Engineer, Silicon Valley Data Science
    Andrew Ray, Principal Data Engineer, Silicon Valley Data Science
    Sean Suchter, CTO and Co-Founder, Pepperdata
  • Philips Wellcentive Cuts Hadoop Troubleshooting from Months to Hours
    Philips Wellcentive Cuts Hadoop Troubleshooting from Months to Hours Geovanie Marquez, Hadoop Architect at Philips Wellcentive Recorded: Dec 6 2016 48 mins
    Philips Wellcentive, a SaaS health management and data analytics company, relies on a nightly Mapreduce job to process and analyze data for their entire patient population; from birth to current day. It looks at their entire patient population to assess a number of different characteristics and powers the analytics that physician organizations need to deliver better services. When this job began to fail repeatedly, the Hadoop team spent months trying to identify the root cause using existing monitoring tools, but were unable to come up with an explanation for the job failures and slowdowns.

    Join our webinar to hear more about why existing Hadoop monitoring tools were insufficient to diagnose the root cause of Philips Wellcentive’s problems and how Pepperdata helped them to significantly improve their Big Data operations. The webinar will cover the different approaches that Philips Wellcentive took to rectify their missed SLAs, and how Pepperdata ultimately helped them quickly troubleshoot their performance problems and ensure their jobs complete on time.
  • Effectively Manage Multi-tenant Hadoop for the Enterprise
    Effectively Manage Multi-tenant Hadoop for the Enterprise Sean Suchter, CTO of Pepperdata Recorded: Nov 14 2016 39 mins
    As the Hadoop market matures and new applications and use cases for Big Data emerge, organizations are dealing with more complex environments than ever before. In days past, deployments often focused on single, batch-oriented workloads, and if you wanted to run multiple workloads at the same time, you needed to split your clusters. With Hadoop 2 and YARN, organizations are able to run multiple workloads on the same cluster. But, in multi-tenant environments, resource contention can become a daily problem and low-priority, ad-hoc jobs can sometimes monopolize hardware resource that is needed for high-priority workloads.

    Pepperdata is the first and only software that guarantees service levels in multi-tenant Hadoop environments. We have helped dozens of companies of all industries and all sizes to effectively manage and scale their multi-tenant environments, guaranteeing service levels and improving overall cluster performance.

    Join us for this webcast to hear best practices for running multi-tenant environments and how you can improve visibility, performance, and overall management of your big data environment.
  • Improve Amazon EMR Performance up to 4X
    Improve Amazon EMR Performance up to 4X Vinod Nair, Product Manager at Pepperdata Recorded: Oct 13 2016 36 mins
    Are you currently running Amazon EMR but lacking the visibility and measurement of how your cluster is performing? Pepperdata for Amazon EMR enables users of Amazon Elastic MapReduce to run jobs up to four times faster and simultaneously reduce costs. Users can see over 300 metrics even after the cluster has been terminated, so users have a historical view of performance.

    Register for our webinar to learn how Amazon EMR can help streamline your big data projects, and how Pepperdata can help you get the most value from your investment.
  • Ensure Quality of Service in Multi-tenant Hadoop Environments
    Ensure Quality of Service in Multi-tenant Hadoop Environments Sean Suchter of Pepperdata and Andy Oram of O'Reilly Media Recorded: Aug 3 2016 44 mins
    This webcast will show you how Pepperdata can help your organization guarantee quality of service in multi-tenant Hadoop environments by eliminating resource contention and guaranteeing service levels for high-priority jobs. Run HBase, MapReduce, Spark, Hive, and more all on a single cluster without worrying about jobs stomping on each other. We'll show you how Pepperdata automates cluster optimization to reduce time and cost, and keep your Hadoop humming happily.
  • Keeping the Trains Running: Effective Troubleshooting for Hadoop
    Keeping the Trains Running: Effective Troubleshooting for Hadoop Sean Suchter, CEO of Pepperdata and Dez Blanchfield, Data Scientist at the Bloor Group Recorded: May 4 2016 67 mins
    When something goes wrong on your Hadoop cluster – a missed job, sudden performance slow downs, or massive spike in IO – are you able to pinpoint the exact cause of the issue? Most times, it can take hours or days (or maybe the cause will never be discovered). Join us for this webinar to see how Pepperdata reduces troubleshooting times by 90% and can prevent most performance problems from ever happening in the first place.
  • Overcome the limitations of distributed computing with real-time intelligence
    Overcome the limitations of distributed computing with real-time intelligence Sean Suchter, CEO and co-founder of Pepperdata Recorded: Mar 2 2016 49 mins
    Most Hadoop admins have learned a set of “best practices” to address and mitigate this performance dilemma, such as manual tuning, cluster isolation, or adding new hardware, but most of these remedies are not sustainable long-term solutions. Join us for this webcast as we survey some of these “best practices” and offer up some new ways to address the performance gap. We’ll also tell you the warning signs to look out for, so you can assess the health and production readiness of your cluster.

    In this webcast, we’ll examine:

    the reality of what the current tools in the ecosystem do before and after a job has run,
    the most common “best practice” approaches to improve performance and the positive and negative outcomes of each, and
    a new approach to performance gains-how to use software to fill the gap of human capability.
  • Optimizing Big Data Clusters in Production - Performance, Capability, and Cost
    Optimizing Big Data Clusters in Production - Performance, Capability, and Cost Sean Suchter, Pepperdata Co-founder and CEO and Mike Matchett, Sr. Analyst at Taneja Group Recorded: Nov 18 2015 54 mins
    Come join us as we learn how to tackle and manage big data application performance. First, Taneja Group Sr. Analyst Mike Matchett will present his take on how enterprise IT is now being challenged to support big data applications in real production environments. He'll discuss why too many enterprises haven't been as successful as they should in taking advantage of their big data opportunities - in many cases losing out to competitors. He'll explore what agile IT/devops really needs to do to not only effectively host, but deliver top-notch, consistent big data performance with the smallest infrastructure cost.
    Then Sean Suchter, co-founder and CEO at Pepperdata will present their compelling approach to solving big data cluster performance challenges. He'll demonstrate how Pepperdata's dynamic run-time optimizations can guarantee consistent performance SLA's in a shared multi-tenant Hadoop cluster. Because Pepperdata delivers detailed visibility into Hadoop cluster activity , the software becomes invaluable for cluster troubleshooting, reporting/chargeback, capacity planning, and other management and optimization requirements. With Pepperdata, IT can now effectively, efficiently, and reliably support all the business-empowering big data applications of an organization. This webcast will be 45 minutes with time reserved for Q+A.

Embed in website or blog