Hi [[ session.user.profile.firstName ]]

Five Mistakes to Avoid When Using Spark

Apache Spark is playing a critical role in the adoption and evolution of Big Data technologies because it provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. The increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.

Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to avoid common mistakes. Attendees can use these observations to improve the usability and supportability of Spark and avoid such issues in their projects.

Topics include:

– Serialization
– Partition sizes
– Executor resource sizing
– DAG management
– Shading
Recorded May 15 2019 32 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Alex Pierce, Pepperdata Field Engineer
Presentation preview: Five Mistakes to Avoid When Using Spark

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Seven Steps to a Successful AWS Cloud Migration Recorded: May 22 2019 31 mins
    Ashrith Mekala, Head of Engineering at Cloudwick
    Cloud migration is more about processes than data. Even seemingly simple tasks like file distribution can require complex migration steps to ensure that the resulting cloud infrastructure matches the desired workflow. Most cloud benefits, from cost savings to scalability, are justifiable. But a proven methodology, a complete understanding of the risks, careful planning and flawless execution are necessary to realize those returns.

    Join presenter Ashrith Mekala, Head of Engineering at Cloudwick, as he shares his experience as a big data solutions architect who has successfully guided dozens of enterprises through the AWS cloud migration process. Attendees can apply these learnings to refine their own processes, avoid the risks, and optimize the benefits of current and planned cloud migrations.

    Topics include:

    – Migration models - forklift, hybrid, native
    – Framework - data migration, data validation and app integration
    – Methodology - including pre-migration state and cloud cost assessment using Pepperdata
    – GAP analysis and project planning
    – Moving from pilot to production
    – Key transition tasks and on-going support
  • Five Mistakes to Avoid When Using Spark Recorded: May 15 2019 32 mins
    Alex Pierce, Pepperdata Field Engineer
    Apache Spark is playing a critical role in the adoption and evolution of Big Data technologies because it provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. The increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.

    Drawing on experiences across dozens of production deployments, Pepperdata Field Engineer Alexander Pierce explores issues observed in a cluster environment with Apache Spark and offers guidelines on how to avoid common mistakes. Attendees can use these observations to improve the usability and supportability of Spark and avoid such issues in their projects.

    Topics include:

    – Serialization
    – Partition sizes
    – Executor resource sizing
    – DAG management
    – Shading
  • Cloud Migration: Opportunities and Risks for the Business Unit Recorded: Apr 24 2019 31 mins
    John Armstrong, Head of Product Marketing
    The business case for cloud migration is compelling: cost reductions, ease of growth and
    expansion, outsourcing of infrastructure and maintenance, and improved access to the latest
    technologies. However, many organizations that are migrating to the cloud focus on technical factors and overlook the broader business implications of their projects. As a best practice, assessing the opportunities and risks for the organization should be a joint effort led by the IT and business unit teams. At the highest level, a business justification focuses on the return on investment (ROI) associated with a proposed technical change. However, many supporting data points are required to populate the formula and achieve a realistic calculation.

    In this webinar, we will address these and other critical questions:

    What are the key business expectations associated with a cloud migration?
    What are the implications if the cloud migration doesn’t go as planned?
    What business considerations should be included in any cloud migration plan?

    And always, this webinar will be followed by a short Q and A session with the audience. Please join us!
  • What is Pepperdata? Recorded: Mar 25 2019 2 mins
    Pepperdata
    Pepperdata has engineered a big data APM solution that empowers operators to automatically optimize the performance and capacity of their big data infrastructure while enabling developers to improve the performance of their applications.

    Unlike other APM tools that merely summarize static data and make application performance recommendations in isolation, Pepperdata delivers complete system analytics on hundreds of real-time operational metrics continuously collected from applications as well as the infrastructure — including CPU, RAM, disk I/O, and network usage metrics on every job, task, user, host, workflow, and queue.

    The result is a comprehensive, intuitive dashboard that provides a holistic view of cluster resources, system alerts, and dynamic recommendations for more accurate and effective troubleshooting, capacity planning, reporting, and application performance management.

    Pepperdata diagnoses problems quickly, automatically alerts about critical conditions affecting system performance, and provides recommendations for rightsizing containers, queues and other resources. Leveraging AI-driven resource management, Pepperdata tunes and optimizes infrastructure resources to recapture wasted capacity and get the most out of the infrastructure.

    Welcome to the new world of real-time big data application and infrastructure performance management.

    Welcome to Pepperdata.

    Optimize your infrastructure, your applications, and your time — at scale.
  • Pepperdata Application Spotlight FREE Recorded: Mar 25 2019
    Pepperdata
    Use Application Spotlight for free on up to 20 Nodes
    Pepperdata Application Spotlight is a self-service APM solution that provides developers with a holistic and real-time view of their applications in the context of the entire big data cluster, allowing them to quickly identify and fix problems (failed Spark applications, for instance) to improve application runtime, predictability, performance and efficiency.
  • Successfully Migrating Big Data Workloads to the Cloud: What You Need to Know Recorded: Mar 20 2019 28 mins
    John Armstrong, Head of Product Marketing, Pepperdata
    Moving workloads to the cloud is either a reality or a near-term goal for an overwhelming number of enterprises. For most organizations, optimizing cloud use to improve operational efficiency and achieve cost savings is the primary objective. But navigating cloud adoption is a complex process that requires careful planning and analyses to achieve desired economic goals and ensure success. It’s a technology decision that has significant impact on the business.

    Economic benefits vs. costs must be accurately estimated and carefully weighed before making a move to cloud…not just for the cluster, but for every workload queue. This webinar will take the guesswork out of calculating cloud migration costs and provide you with the detailed analyses you need to make fully-informed technical and business decisions before embarking on your cloud migration journey.

    This webinar addresses critical questions for organizations considering or already deploying big data workloads in the cloud:

    - How accurate are my cloud migration and long-term deployment cost estimates?
    - What queues will be more cost-effective in the cloud, and which ones are better left on-premises?
    - What AWS, Azure, Google, or IBM cloud instances will work best for each of my queues? CPU-optimized? Memory-optimized? General purpose?
    - How can I help my team to make a successful transition to deploying workloads using the public cloud?
  • Breaking Through Big Data Bottlenecks Recorded: Mar 6 2019 28 mins
    Kirk Lewis, Pepperdata Field Engineer
    Bottlenecks are a fact of life in IT. No matter how fast you build something, somebody will find a way to max it out. But bottlenecks can be crippling to organizations whose business operations depend on reliable and consistent service levels. Deploying an application performance management (APM) solution optimized to address big data challenges is essential for rapidly identifying and overcoming congestion within operational environment.

    In this webinar, we will:
    · Walk through a number of bottlenecks ranging from "easy to find" to "hard to find”
    · Discuss examples involving CPU, (easy) memory, network and I/O (hard)
    · Show you how you can quickly identify root cause and resolve big data bottlenecks
  • 8 ROI Benefits of APM Recorded: Feb 20 2019 27 mins
    John Armstrong, Head of Product Marketing, Pepperdata
    For most enterprises, APM is considered an essential element of IT operations, bridging production and development with IT and digital business. As companies invest in new technology and projects in their digital transformation journeys, it’s critical to understand the ROI value of those investments.

    This webinar will look at eight ROI benefits of APM — both financial and non-financial — that organizations need to consider when evaluating APM solutions. These include increased developer productivity, reducing downtime, improving business continuity and more.

    Attendees will learn:
    - 8 elements to consider when assessing APM solutions
    - How to evaluate financial and non-financial benefits to technology solutions
    - How leading organizations measure ROI, often through hard lessons learned
  • Ensuring Uptime for Healthcare Recorded: Feb 4 2019 2 mins
    Dr. Charles Boicey, Clearsense Chief Innovation Officer
    “There is no tolerance for downtime in healthcare, which is why we bought Pepperdata. We started using Pepperdata on day one because Pepperdata instruments and monitors the resources as well as the applications running on the Clearsense Platform.

    "No else does that. We couldn’t do what we do without Pepperdata,”
    –Dr. Charles Boicey, Clearsense Chief Innovation Officer
  • Leveraging APM to Overcome Big Data Challenges Recorded: Jan 16 2019 40 mins
    John Armstrong, Head of Product Marketing, Pepperdata
    Leveraging APM to Overcome Big Data Development and Infrastructure Performance Challenges

    While businesses are deriving tremendous insights from ever-growing big data sets, development teams are challenged with increasingly resource-hungry workloads and overwhelming bottlenecks that impact productivity. This makes big data application performance management (APM) a must-have in today’s ecosystem. Join us to learn how APM can help enterprises overcome development and performance challenges associated with growing big data stores.
    Attendees will learn:
    - What is driving the demand for big data in application development
    - Challenges application developers face when working with increasingly larger workloads
    - How APM can mitigate these and other challenges, improve workflow productivity, and optimize resource effectiveness
  • Optimizing BI Workloads with Pepperdata Recorded: Dec 11 2018 36 mins
    Pepperdata
    BI workloads are an increasingly important part of your big data system and typically consist of large queries that analyze huge amounts of data. Because of this, BI users frequently complain about the responsiveness of their applications.

    Learn how Pepperdata enables you to tune your big data system and applications to meet SLAs for critical BI workloads.
  • Pepperdata Helps Clearsense Ensure 99.999% Uptime and Maximize Life-Saving Apps Recorded: Nov 14 2018 46 mins
    Charles Boicey, Clearsense Chief Innovation Officer
    Clearsense is a healthcare technology company that helps its customers realize measurable value from data with real-time analytics. Clearsense collects patient information — from monitors, ventilators and other biomedical devices — and provides real-time views of patient conditions and changes for early detection and prevention. With no room for downtime, Clearsense relies on Pepperdata to help them ensure uptime and optimize application performance.

    Join Pepperdata and Clearsense Chief Innovation Officer Charles Boicey for this informative webinar.

    - Learn how Clearsense relies on Pepperdata to:
    - Ensure 99.999% uptime for life-saving applications
    - Enable clinicians to better monitor and alert on health issues and avoid catastrophic events
    - Provide customers with fast and reliable access to data and analytics
    - Run applications at maximum efficiency
    - Plan accurately for growth
    - And more

    We have no tolerance for downtime, which is why we use Pepperdata.”
    ~ Clearsense Chief Innovation Officer Charles Boicey
  • Operations Manager Q and A – Do More with Your Big Data Platform Recorded: Oct 24 2018 24 mins
    Alex Pierce, Field Engineer
    Organizations are faced with countless obstacles to achieving big data success, including platform, application and user issues, as well as limited resources. This webinar will answer operational management questions around optimizing performance and maximizing capacity, such as “Who’s blowing up our cluster?, “How can I run more applications?” and more. You will learn from our expert, based on real-world deployments, how a complete APM solution provides:

    – Reduced mean time to problem resolution.
    – An accurate understanding of the most expensive users.
    – Improved platform throughput, uptime, efficiency and performance.
    – Reduced backlog.
    – And more.

    Presenter

    Alex Pierce joined Pepperdata in 2014. Previously, he worked as a senior solution architect at WanDisco. Before that, he was the senior solution architect at Red Hat. Alex has a strong background in system administration and big data.
  • Capacity Manager Q and A – How to Improve Productivity, Throughput, and Uptime Recorded: Oct 10 2018 34 mins
    Kirk Lewis, Field Engineer
    There are numerous challenges to leveraging your big data infrastructure for optimal performance. This webinar answers operational management questions around optimizing performance and maximizing capacity, such as “Who’s blowing up our cluster?”, “How can I run more applications?” and more. You will learn from our expert, based on real-world deployments, how a complete APM solution delivers:

    – Improved throughput, uptime, efficiency and performance.
    – Accurate capacity planning.
    – Deploy capacity accurately for predictable performance.
    – Recapture wasted resources to maximize current infrastructure.

    Presenter

    Kirk joined Pepperdata in 2015. Previously, he was a Solutions Engineer at StackVelocity. Before that he was the lead technical architect for big data production platforms at American Express. Kirk has a strong background in big data.
  • Developer Q and A – Improve Your Application Performance and Efficiency Recorded: Sep 26 2018 21 mins
    Alex Pierce, Field Engineer
    Developers are faced with specific challenges to big data success, including poor performance, unpredictable runtimes, and bottlenecks. This webinar will focus on answering questions you need answers to, like “Why is my job running so slow? and “How do I guarantee SLAs of my app?” Our expert will answer these questions and more, with insight from real-world deployments. Learn how to achieve:

    – Improved application performance and efficiency.
    – Reduced troubleshooting time.
    – Improved resource utilization.
    – Insight on cluster events impacting applications.
    – And more.

    Presenter

    Alex joined Pepperdata in 2014. Previously, he worked as a senior solution architect at WanDisco. Before that, he was the senior solution architect at Red Hat. Alex has a strong background in system administration and big data.
  • Four Ways Operators Can Fix Slowdowns and Improve Big Data Cluster Performance Recorded: Aug 22 2018 28 mins
    Kirk Lewis
    Despite tremendous progress, there are critically important areas, including multi-tenancy, performance optimization, and workflow monitoring where the DevOps team still need management help.

    In this webinar, presenter and Pepperdata Field Engineer, Kirk Lewis discusses why big data clusters slow down, how to fix them, and how to keep them running at an optimal level. In this online webinar followed by a live Q and A, Field Engineer Kirk Lewis discusses:

    • How Pepperdata Cluster Analyzer helps operators overcome Hadoop and Spark performance limitations by monitoring all facets of cluster performance in real time, including CPU, RAM, disk I/O, and network usage by user, job, and task.

    • How Pepperdata Capacity Optimizer increases capacity utilization by 30-50% without adding new hardware

    • How Pepperdata adaptively and automatically tunes the cluster based on real-time resource utilization with performance improvement results that cannot be achieved through manual tuning.
  • How Operations Performance Management Informs APM to Help Big Data Developers Recorded: Aug 8 2018 17 mins
    Alex Pierce
    Continually tuning your applications isn’t the best APM scenario for big data developers. This webinar discusses how big data operations performance management (OPM) provides the necessary context for more robust APM. A routine element of running a big big data platform is hardware. The OPM resource usage metrics that provide operators with clarity when something is not a clear cut bottleneck in the realm of CPU, memory, IO performance, are also extremely useful to developers as well, and can provide them with notifications and alerts for hardware related issues like network errors on a specific network interface.

    This webinar discusses how big data total performance management (TPM) combines operations performance management (OPM) and applications performance management (APM).

    Pepperdata® Application Spotlight is a self-service APM portal that provides developers with a consolidated view to improve troubleshooting and optimize application performance. Application Spotlight enables big data application developers to quickly and easily improve performance with more relevant application information, performance recommendations, insights, and calls to action, all in one place. In addition to helping them make their jobs go faster, Application Spotlight enables developers to be better tenants in multi-tenant clusters and shows them how to write optimal performing jobs and more efficiently use their queue and cluster resources with practical, innovative application performance management solutions.
  • Application Performance Management (APM) for Big Data Apps and Infrastructure Recorded: Jul 18 2018 37 mins
    Kirk Lewis, Pepperdata Field Engineer
    Pepperdata Application Spotlight analyzes all Hadoop and Spark jobs running on the cluster and provides developers with technical insights on how each job performed. Intended for software engineers, developers, and technical leads who develop Spark applications, this webinar demonstrates how Application Spotlight helps developers quickly improve application performance, reduce resource usage, and understand application failures.

    Learn how developers can:

    – Maximize performance, improve productivity, guarantee reliability, and improve ROI
    – Generate application-specific recommendations to improve application performance
    – Highlight applications that need attention
    – Automatically identify bottlenecks, and alert on duration, failure conditions, and resource usage
    – Search for any applications running on the cluster, compare current and previous runs
    – Visualize Spark applications and its stages for easy root cause failure analysis and performance tuning

    Presenter Bio: Field Engineer, Kirk Lewis

    Kirk Lewis joined the Pepperdata team in 2015. Previously, he was a Solutions Engineer at StackVelocity. Before that he was the lead technical architect for big data production platforms at American Express. Kirk has a strong background in big data.
  • Diagnosing Application Failures and Errors Recorded: May 16 2018 40 mins
    Kirk Lewis
    This webinar will present the results of analyzing many Hadoop and Spark jobs on many multi-tenant production clusters. We will cover common issues seen, the symptoms of those issues, and how to address them. We will discuss the Pepperdata APM solution and discuss best practices for diagnosing application failures and errors.

    Pepperdata has gathered trillions of performance data points on production clusters running Hadoop and Spark, covering a variety of industries, applications, and workload types. We will present key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of applications.
  • Building a Big Data Stack on Kubernetes Recorded: May 2 2018 48 mins
    Sean Suchter
    There is growing interest in running Apache Spark natively on Kubernetes (see https://github.com/apache-spark-on-k8s/spark). Intended for software engineers, developers, architects and technical leads who develop Spark applications, this session will discuss how to build a big data stack on Kubernetes. In particular, Sean will demonstrate:

    –The official Apache Spark 2.3 Kubernetes integration
    –How Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons.
    –How you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.
Performance Management for Big Data
Pepperdata is the Big Data performance company. Fortune 1000 enterprises depend on Pepperdata to manage and optimize the performance of Hadoop and Spark applications and infrastructure. Developers and IT Operations use Pepperdata soluions to diagnose and solve performance problems in production, increase infrastructure efficiencies, and maintain critical SLAs. Pepperdata automatically correlates performance issues between applications and operations, accelerates time to production, and increases infrastructure ROI. Pepperdata works with customer Big Data systems on-premises and in the cloud.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Five Mistakes to Avoid When Using Spark
  • Live at: May 15 2019 5:00 pm
  • Presented by: Alex Pierce, Pepperdata Field Engineer
  • From:
Your email has been sent.
or close