Hi [[ session.user.profile.firstName ]]

Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell

In this webcast, Patrick Wendell from Databricks will be speaking about Apache Spark's new 1.6 release.

Spark 1.6 will include (but not limited to) a type-safe API called Dataset on top of DataFrames that leverages all the work in Project Tungsten to have more robust and efficient execution (including memory management, code generation, and query optimization) [SPARK-9999], adaptive query execution [SPARK-9850], and unified memory management by consolidating cache and execution memory [SPARK-10000].
Recorded Dec 1 2015 61 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Patrick Wendell
Presentation preview: Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Fast and Reliable ETL Pipelines with Databricks Mar 7 2018 6:00 pm UTC 60 mins
    Prakash Chockalingam, Product Manager at Databricks
    Building multiple ETL pipelines is very complex and time consuming, making it a very expensive endeavor. As the number of data sources and the volume of the data increases, the ETL time also increases, negatively impacting when an enterprise can derive value from the data.

    Join Prakash Chockalingam, Product Manager and data engineering expert at Databricks, to learn how to avoid the common pitfalls of data engineering and how the Databricks Unified Analytics Platform can ensure performance and reliability at scale to lower total cost of ownership (TCO).

    In this webinar, you will learn how Databricks can help to:
    - Remove infrastructure configuration complexity to reduce DevOps efforts
    - Optimize your ETL data pipelines for performance without compromising reliability
    - Unify data engineering and data science to accelerate innovation for the business.
  • Azure Databricks: Accelerating Innovation with Microsoft Azure and Databricks Recorded: Feb 15 2018 52 mins
    Brian Dirking, Senior Director of Partner Marketing at Databricks
    Data scientists and data engineers need a secure and scalable platform to collaborate on analytics. Register for this webinar and see how Azure Databricks provides a platform that enables teams to accelerate innovation, providing:

    - A collaborative workspace to experiment with models and datasets, and then put jobs into action instantly.
    - An automated infrastructure that enables you to autoscale compute and storage independently.

    The live demo portion of the webinar will show how Azure Databricks can bring in streaming data, run it in a machine learning model, and then output the results to PowerBI for visualization.
  • What's New in the Upcoming Apache Spark 2.3 Release? Recorded: Feb 8 2018 49 mins
    Reynold Xin, Chief Architect at Databricks, and Jules Damji, Spark Community and Developer Advocate
    The upcoming Spark 2.3 release marks a big step forward in speed, unification, and API support.

    Reynold Xin and Jules Damji from Databricks will walk through how you can benefit from the upcoming improvements:

    - New DataSource APIs that enable developers to more easily read and write data for Continuous Processing in Structured Streaming.
    - PySpark support for vectorization, giving Python developers the ability to run native Python code fast.
    - Improved performance by taking advantage of NVMe SSDs.
    - Native Kubernetes support, marrying the best of container orchestration and distributed data processing.
  • Ten Must-Haves to Deploy Machine Learning and AI in the Enterprise Recorded: Jan 25 2018 61 mins
    Forrester VP & Principal Analyst, Mike Gualtieri; Data Science Lead at Overstock, Chris Robison; PM at Overstock, Craig Kelly
    Enterprise data science teams are driving big innovations in machine learning, but this has put them under increased pressure to deliver more models, more frequently, and more rapidly.

    In this webinar, Forrester VP & Principal Analyst, Mike Gualtieri, will share data on the top trends in machine learning and lay out what data science teams need to do in order to maximize their output.

    Chris Robison, Head of Data Science at Overstock.com and Craig Kelly, Group Product Manager at Overstock.com, will showcase how they utilized big data and machine learning to

    -Create a one-to-one personalized shopping experience.
    -Decrease cost of moving models to production by nearly 50%.
    -Stand up new models 5x faster than before.
  • How Databricks helps iPass optimize for performance and availability Recorded: Jan 10 2018 60 mins
    Tomasz Magdanski, Director of Big Data and Analytics at iPass
    iPass is the world’s largest wifi network serving over 160 network providers with nearly 60+ million hotspots in airports, hotels, airplanes, and public spaces in 120 countries across the globe.

    Analyzing the state of the world’s wifi in real time is a daunting task fraught with unpredictable challenges that can impact performance, reliability, and security. Join this webinar to learn why iPass moved from an on-premises Hadoop system to Databricks in the cloud and how they are able to deliver ground-breaking results with a small and nimble team.

    With Databricks, iPass can now focus on scalable business logic and not building infrastructure. This new found freedom has allowed their team to:
    -monitor the performance of millions of wifi hotspots around world.
    -leverage machine learning and real-time analytics to understand the health of access points.
    -make recommendations to customers on the best access point to use to ensure optimal performance.
  • Continuous Integration & Continuous Delivery with Databricks Recorded: Dec 7 2017 45 mins
    Prakash Chockalingam, Product Manager at Databricks
    Continuous integration and continuous delivery (CI/CD) enables an organization to rapidly iterate on software changes while maintaining stability, performance, and security. Many organizations have adopted various tools to follow the best practices around CI/CD to improve developer productivity, code quality, and software delivery. However, following the best practices of CI/CD is still challenging for many big data teams.

    This webinar will highlight:
    *Key challenges in building a data pipeline for CI/CD.
    *Key integration points in a data pipeline's CI/CD cycle.
    *How Databricks facilitates iterative development, continuous integration and build.
  • Unified Data Management: The Best of Data Lakes, Data Warehouses and Streaming Recorded: Nov 16 2017 61 mins
    Jason Pohl, Software Engineer at Databricks, and Bill Chambers, Product Manager at Databricks
    Current data management architectures are a complex combination of siloed, single-purpose tools. There are data lakes for low cost storage, but are difficult to use for data discovery, data warehouses that are reliable and optimized for fast queries, but come at a cost when having to scale, and various streaming and batch systems to shuffle data between them, often times resulting in data integrity issues.

    Businesses have to create a patchwork of different tools, skillsets, and expertise just to solve one fundamental problem: How can I make data-driven decisions faster?

    Join this webinar to learn how Databricks Delta — a new unified data management system — takes advantage of the the scale of a data lake, the reliability and performance of a data warehouse, and the low-latency updates of a streaming system, all in a unified and fully managed fashion.

    This webinar will cover:
    -How the need to process batch and streaming data creates challenges for enterprises with complex data architectures.
    -How Databricks Delta takes the best of data warehouses, data lakes and streaming systems to provide a highly scalable, performant, and reliable data management system.
    -A live demonstration of Databricks Delta to showcase how easy it is to cost-efficiently scale without impacting query performance.
  • 5 Keys to Build Machine Learning and Visualization Into Your Application Recorded: Nov 8 2017 51 mins
    Databricks, Handshake, and Looker
    Machine learning has unlocked new possibilities that deliver significant business value. However most companies don’t have the resources to either build and maintain the supporting infrastructure or apply data science to build a smarter solution.

    Join us for this webinar and hear from John Huang, engineering and data analytics lead at Handshake, as he shares how he quickly and cost effectively scaled a small engineering team to build an machine-learning powered recommendation engine that profiles users and behaviors to present relevant next steps. In this webinar you will learn how to:

    -Simplify and accelerate data engineering processes including data ingest and ETL
    -Incorporate machine learning into your production application without an army of data scientists
    -Choose an analytics engine that will enable key analytics such as attribution, step analysis, and linear regression
    -Embed visualizations into your application that drive stickiness
  • How to Put Cluster Management on Autopilot Recorded: Oct 19 2017 49 mins
    Prakash Chockalingam, Product Manager at Databricks
    A key obstacle for doing data engineering at scale is having a robust distributed infrastructure on which frameworks like Apache Spark can run efficiently. On top of building the infrastructure, having proper automatic functioning of the infrastructure is another critical piece for running production workloads.

    Join this webinar to learn how Databricks’ Unified Analytics Platform can help simplify your data engineering problems by configuring your distributed infrastructure to be in autopilot mode. Learn how:
    -Databricks’ automated infrastructure will allow you to autoscale compute and storage independently.
    -To significantly reduce cloud costs through cutting edge cluster management features.
    -To control certain features in the cluster management and balance between ease of use and manual control.
  • How CardinalCommerce Significantly Improved Data Pipeline Speeds by 200% Recorded: Sep 21 2017 61 mins
    Christopher Baird from CardinalCommerce
    CardinalCommerce was acquired by Visa earlier this year for its critical role in payments authentication. Through predictive analytics and machine learning, Cardinal measures performance and behavior of the entire authentication process across checkout, issuing and ecosystem partners to recommend actions, reduce fraud and drive frictionless digital commerce.

    With Databricks, CardinalCommerce simplified data engineering to improve the performance of their ETL pipeline by 200% while reducing operational costs significantly via automation, seamless integration with key technologies, and improved process efficiencies.

    Join this webinar to learn how CardinalCommerce was able to:
    -Simplify access to data across the organization
    -Accelerate data processing by 200%
    -Reduce EC2 costs through faster performance and automated infrastructure
    -Visualize performance metrics to customers and stakeholders
  • Performance Benchmarking Big Data Platforms in the Cloud Recorded: Aug 22 2017 47 mins
    Reynold Xin, Co-founder and Chief Architect at Databricks
    Performance is often a key factor in choosing big data platforms. Over the past few years, Apache Spark has seen rapid adoption by enterprises, making it the de facto data processing engine for its performance and ease of use.

    Since starting the Spark project, our team at Databricks has been focusing on accelerating innovation by building the most performant and optimized Unified Analytics Platform for the cloud. Join Reynold Xin, Co-founder and Chief Architect of Databricks as he discusses the results of our benchmark (using TPC-DS industry standard requirements) comparing the Databricks Runtime (which includes Apache Spark and our DBIO accelerator module) with vanilla open source Spark in the cloud and how these performance gains can have a meaningful impact on your TCO for managing Spark.

    This webinar covers:
    Differences between open source Spark and Databricks Runtime.
    Details on the benchmark including hardware configuration, dataset, etc.
    Summary of the benchmark results which reveal performance gains by up to 5x over open source Spark and other big data engines.
    A live demo comparing processing speeds of Databricks Runtime vs. open source Spark.

    Special Announcement: We will also announce an experimental feature as part of the webinar that aims at drastically speeding up your workloads even more. Be the first to see this feature in action. Register today!
  • Productionizing Apache Spark™ MLlib Models for Real-time Prediction Serving Recorded: Aug 10 2017 52 mins
    Joseph Bradley and Sue Ann Hong
    Data science and machine learning tools traditionally focus on training models. When companies begin to employ machine learning in actual production workflows, they encounter new sources of friction such as sharing models across teams, deploying identical models on different systems, and maintaining featurization logic. In this webinar, we discuss how Databricks provides a smooth path for productionizing Apache Spark MLlib models and featurization pipelines.

    Databricks Model Scoring provides a simple API for exporting MLlib models and pipelines. These exported models can be deployed in many production settings, including:

    * External real-time low-latency prediction serving systems, without Spark dependencies,
    * Apache Spark Structured Streaming jobs, and
    * Apache Spark batch jobs.

    In this webinar, we overview our solution’s functionality, describe its architecture, and demonstrate how to use it to deploy MLlib models to production.
  • Build, Scale, and Deploy Deep Learning Pipelines with Ease Recorded: Jul 27 2017 62 mins
    Sue Ann Hong, Tim Hunter and Jules S. Damji
    Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner.

    This webinar is the first of a series in which we survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies Deep Learning in three major ways:

    1. It has a simple API that integrates well with enterprise Machine Learning pipelines.
    2. It automatically scales out common Deep Learning patterns, thanks to Spark.
    3. It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL.

    In this webinar, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show:

    * how to build deep learning models in a few lines of code;
    * how to scale common tasks like transfer learning and prediction; and
    * how to publish models in Spark SQL.
  • Accelerate Data Science with Better Data Engineering with Databricks Recorded: Jul 13 2017 63 mins
    Andrew Candela
    Whether you’re processing IoT data from millions of sensors or building a recommendation engine to provide a more engaging customer experience, the ability to derive actionable insights from massive volumes of diverse data is critical to success. MediaMath, a leading adtech company, relies on Apache Spark to process billions of data points ranging from ads, user cookies, impressions, clicks, and more — translating to several terabytes of data per day. To support the needs of the data science teams, data engineering must build data pipelines for both ETL and feature engineering that are scalable, performant, and reliable.

    Join this webinar to learn how MediaMath leverages Databricks to simplify mission-critical data engineering tasks that surface data directly to clients and drive actionable business outcomes. This webinar will cover:

    - Transforming TBs of data with RDDs and PySpark responsibly
    - Using the JDBC connector to write results to production databases seamlessly
    - Comparisons with a similar approach using Hive
  • How Databricks and Machine Learning is Powering the Future of Genomics Recorded: May 25 2017 59 mins
    Frank Austin Nothaft, Genomics Data Engineer at Databricks
    With the drastic drop in the cost of sequencing a single genome, many organizations across biotechnology, pharmaceuticals, biomedical research, and agriculture have begun to make use of genome sequencing. While the sequence of a single genome may provide insight about the individual who was sequenced, to derive maximal insight from the genomic data, the ultimate goal is to query across a cohort of many hundreds to thousands of individuals.

    Join this webinar to learn how Databricks — powered by Apache Spark — enables queries across a database of genomics in interactive time and simplifies the application of machine learning models and statistical tests to genomics data across patients, to derive more insight into the biological processes driven by genomic alterations.

    In this webinar, we will:

    - Demonstrate how Databricks can rapidly query annotated variants across a cohort of 1,000 samples.
    - Look at a case study using Databricks to improve the performance of running an expression quantitative trait loci (eQTL) test across samples from the GEUVADIS project.
    - Show how we can parallelize conventional genomics tools using Databricks.
  • Deploying Machine Learning Techniques at Petabyte Scale with Databricks Recorded: May 22 2017 61 mins
    Saket Mengle, Senior Principal Data Scientist at DataXu
    The central premise of DataXu is to apply data science to better marketing. At its core, is the Real-time Bidding Platform that processes 2 petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is DataXu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics.

    This webinar will showcase DataXu’s successful use-cases of using the Apache® Spark™ framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production.

    We will also discuss in detail:

    Challenges of using Apache Spark in a petabyte scale machine learning system and how we worked to solve the issues.
    Best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
  • Deep Learning on Apache® Spark™: Workflows and Best Practices Recorded: May 4 2017 47 mins
    Tim Hunter and Jules S. Damji
    The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.

    Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:

    * optimizing cluster setup;
    * configuring the cluster;
    * ingesting data; and
    * monitoring long-running jobs.

    We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.

    Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
  • Databricks Product Demonstration Recorded: Apr 19 2017 48 mins
    Don Hilborn
    This is a live demonstration of the Databricks virtual analytics platform.
  • How to Increase Data Science Agility at Scale with Databricks Recorded: Mar 30 2017 51 mins
    Maddie Schults
    Apache® Spark™ has become an indispensable tool for data science teams. Its performance and flexibility enables data scientists to do everything from interactive exploration, feature engineering, to model tuning with ease. In this webinar, Maddie Schults - Databricks product manager - will discuss how Databricks allows data science teams to use Apache Spark for their day-to-day work.

    You will learn:

    - Obstacles faced by data science teams in the era of big data;
    - How Databricks simplifies Spark development;
    - A demonstration of key Databricks functionalities that help data scientists become more productive.
  • Databricks Product Demonstration Recorded: Mar 29 2017 63 mins
    Miklos Christine
    This is a live demonstration of the Databricks virtual analytics platform.
Making Big Data Simple
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
  • Live at: Dec 1 2015 5:00 pm
  • Presented by: Patrick Wendell
  • From:
Your email has been sent.
or close