Hi [[ session.user.profile.firstName ]]

Apache® Spark™ MLlib 2.x: Migrating ML Workloads to DataFrames

In the Apache® Spark™ 2.x releases, Machine Learning (ML) is focusing on DataFrame-based APIs. This webinar is aimed at helping users take full advantage of the new APIs. Topics will include migrating workloads from RDDs to DataFrames, ML persistence for saving and loading models, and the roadmap ahead.
Recorded Dec 8 2016 61 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Joseph K. Bradley and Jules S. Damji
Presentation preview: Apache® Spark™ MLlib 2.x: Migrating ML Workloads to DataFrames

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • How to Automate Machine Learning and Scale Delivery Jan 21 2020 3:00 pm UTC 62 mins
    Nauman Fakhar, Solutions Architect at Databricks Brett Olmstead, Data Scientist at DataRobot
    On-Demand Webinar
    Creating and testing machine learning models is time consuming, and delivering results in production at high volumes is prone to error. Automating the development of models is not only helping companies reduce errors but it’s also uncovering powerful business insights - faster.

    Join our upcoming webinar, How to Automate Machine Learning and Scale Delivery, to learn how to:
    Use automation to dynamically select optimal machine learning models for your use case
    Automate the process of deploying models to production with high-volume data pipelines
    Configure your automation to quickly scale up and down as data volumes change
  • Harness the Power of Real-Time Analytics for Rapid Decision-Making Jan 16 2020 4:00 am UTC 50 mins
    Stephen Glasworth, Chief Data Officer, Quby Thor List, Senior Solutions Architect, Databricks, Inc. Chris Burns, Partner Solu
    To remain competitive in today’s real-time world, it’s no longer enough to know what happened yesterday or even in the last hour. Businesses need access to data now, as it’s being generated, to help them make faster and more precise decisions. With AWS, you can easily and cost-effectively ingest, process, and analyze real-time, streaming data, such as application logs, website click streams, video, and IoT telemetry, at any scale, to create actionable insights in seconds.

    In this webinar, you’ll learn:
    How your organization can leverage real-time, streaming analytics to capitalize on time sensitive opportunities, meet customer demands, and reduce operational risks
    How to tackle the architectural and business challenges of transitioning from batch to real-time analytics
    How these solutions can help your organization quickly and cost-effectively implement a real-time analytics strategy
  • Data Engineering Best Practices Jan 14 2020 4:00 am UTC 57 mins
    Suraj Archarya Director, Engineering . Singh Garewal, Director Marketing
    Making quality data available in a reliable manner is a major determinant of success for data analytics initiatives be they regular dashboards or reports, or advanced analytics projects drawing on state of the art machine learning techniques. Data engineers tasked with this responsibility need to take account of a broad set of dependencies and requirements as they design and build their data pipelines.

    Join Suraj Acharya, Director, Engineering at Databricks, and Singh Garewal, Director of Product Marketing, as they discuss the modern IT/ data architecture that a data engineer must operate within, data engineering best practices they can adopt and desirable characteristics of tools to deploy.

    In this webinar you will learn:
    - A framework for describing the modern data architecture
    - Best practices for executing data engineering responsibilities
    - Characteristics to look for when making technology choices
  • What is Deep Learning? Deep Learning Fundamentals Jan 9 2020 4:30 am UTC 53 mins
    Denny Lee, Global Developer Advocate at Databricks
    In this Deep Learning Fundamental Series Part 2, we will cover the principles for training your neural network including activation and loss functions, batch sizes, data normalization, and validation datasets.

    All these concepts will be brought to life by demonstrating how Databricks simplifies deep learning - letting you quickly access ready-to-use ML environments, as well as prepare data, and train models faster. After this session, if requested, you will receive the presentation and associated notebooks so you can run the samples yourself.
  • Security Best Practices 101 Dec 19 2019 4:00 am UTC 35 mins
    Kevin Clugage Sr Director, Partner Marketing, DatabricksAnna Shrestinian Product Manager, Databricks
    Azure Databricks is a Unified Analytics Platform built with a security-first mindset that enables you to run analytics and Machine Learning workloads at scale without compromising on security.

    Join Anna Shrestinian, Product Manager at Databricks, and Kevin Clugage, Sr Director of Partner Marketing, as they describes how a number of Azure-specific features fit into the Databricks model for data security, and how to utilize them with best practices designed to make it easier to manage and operate a secure environment. We will also feature a live demo of Azure Databricks to see a few concrete examples of these capabilities in action.

    In this webinar, you will learn:
    How to use Platform Security features for networking and storage with Azure Databricks such as VNET Injection, No Public IPs and Encryption.
    Deploy, Operate and Govern at Scale for Authentication and Authorization with Azure Databricks using Azure Active Directory single sign-on, Azure Data Lake Storage Gen2 credential passthrough and integration with Azure Key Vault.
  • How to Automate Machine Learning and Scale Delivery Recorded: Dec 10 2019 62 mins
    Nauman Fakhar, Solutions Architect at Databricks Brett Olmstead, Data Scientist at DataRobot
    On-Demand Webinar
    Creating and testing machine learning models is time consuming, and delivering results in production at high volumes is prone to error. Automating the development of models is not only helping companies reduce errors but it’s also uncovering powerful business insights - faster.

    Join our upcoming webinar, How to Automate Machine Learning and Scale Delivery, to learn how to:
    Use automation to dynamically select optimal machine learning models for your use case
    Automate the process of deploying models to production with high-volume data pipelines
    Configure your automation to quickly scale up and down as data volumes change
  • Data Security Best Practices 101 Recorded: Dec 5 2019 35 mins
    Kevin Clugage Sr Director, Partner Marketing, DatabricksAnna Shrestinian Product Manager, Databricks
    Azure Databricks is a Unified Analytics Platform built with a security-first mindset that enables you to run analytics and Machine Learning workloads at scale without compromising on security.

    Join Anna Shrestinian, Product Manager at Databricks, and Kevin Clugage, Sr Director of Partner Marketing, as they describes how a number of Azure-specific features fit into the Databricks model for data security, and how to utilize them with best practices designed to make it easier to manage and operate a secure environment. We will also feature a live demo of Azure Databricks to see a few concrete examples of these capabilities in action.

    In this webinar, you will learn:
    How to use Platform Security features for networking and storage with Azure Databricks such as VNET Injection, No Public IPs and Encryption.
    Deploy, Operate and Govern at Scale for Authentication and Authorization with Azure Databricks using Azure Active Directory single sign-on, Azure Data Lake Storage Gen2 credential passthrough and integration with Azure Key Vault.
  • Deep Learning Fundamentals: Introduction to Neural Networks Recorded: Nov 12 2019 59 mins
    Denny Lee, Global Developer Advocate at Databricks
    Deep Learning has shown tremendous success, but what makes it so special? What are neural networks, and how do they work? What are the differences between popular Deep Learning frameworks like Keras or TensorFlow, and where should you start?

    In this webinar, we will show the relationship between neural networks and simpler ML models, and understand what gives neural networks their expressive power.

    We will bring these concepts to life and demonstrate how Databricks simplifies deep learning by letting you quickly access ready to use ML environments as well as prepare data, and train models faster.

    About the presenter:
    Denny Lee is a Global Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers.
  • Accelerate Innovation by Unifying Data and AI Recorded: Oct 29 2019 57 mins
    Databricks
    See how Apple, Finra, FIS, Overstock.com, Hewlett-Packard, Shell, Hotels.com and many others overcame these challenges to connect data science and data engineering using Databricks, the company founded by the original creators of Apache Spark™. The results? Faster performance, scaled data processes, simplified infrastructure, streamlined workflows, and greater collaboration.
  • How Netflix Data Science Powers Global Entertainment Recorded: Oct 23 2019 22 mins
    Databricks & Netflix
    Netflix is bringing engaging, culturally diverse stories to people all across the globe. With each original movie or TV show, we learn more about what our members want – and rest assured, they want an increasingly broad, deep, global, dynamic library!

    Whether it’s planning how to satisfy these global tastes with the right content portfolio or personalizing recommendations to each member, Netflix relies heavily on data science techniques. This talk will highlight some of our core data science strategies and applications involving predictive models & algorithms, experimentation, and analytics.
  • Hotels.com’s Journey to Becoming an Algorithmic Business Recorded: Oct 15 2019 20 mins
    Databricks & Hotels.com
    In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.
  • Deep Learning Fundamentals - Training your Neural Network Recorded: Oct 14 2019 54 mins
    Denny Lee, Global Developer Advocate at Databricks
    In this Deep Learning Fundamental Series Part 2, we will cover the principles for training your neural network including activation and loss functions, batch sizes, data normalization, and validation datasets.

    All these concepts will be brought to life by demonstrating how Databricks simplifies deep learning - letting you quickly access ready-to-use ML environments, as well as prepare data, and train models faster. After this session, if requested, you will receive the presentation and associated notebooks so you can run the samples yourself.
  • Winning the Audience with AI: Comcast’s Journey to Building an Agile Data and AI Recorded: Oct 8 2019 17 mins
    Databricks & Comcast
    Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers. Over the last couple years, Comcast has transformed the customer experience using machine learning. For example, Comcast uses machine learning to power the X1 voice remote, which was used over 8B times in 2018 by our customers to find something they love to watch, get the latest sports statistics, control their home, or check their bill and troubleshoot their service using natural language.

    What all these different applications have in common is that to create and operate the machine learning models powering these applications we need to ingest many TBs of data on daily basis in an efficient and resilient manner, and need a machine learning platform that allows for fast exploration of new ideas while at the same time automatic deployment of the resulting machine learning models into a production environment that can handle Comcast scale.

    In this talk we describe our data and machine learning infrastructure built on Databricks Unified Analytics Platform including how Databricks Delta is used for the ingest and initial processing of the raw telemetry from our video and voice applications and devices. We then explain how this data can be used by both the product organizations to gain deeper insights into how our products are being used, as well as by our research and engineering teams to train and fuel the machine learning models at the heart of of these products. This keynote will also include an end-to-end demonstration of our machine learning platform that is centered around Databricks and MLFlow and how it integrates with other open source machine learning frameworks such as Tensorflow, PyTorch, Sklearn, H20 and Kubeflow to name a few.
  • Accelerating Machine Learning Recorded: Oct 4 2019 58 mins
    Databricks
    In this webinar, we will cover some of the latest innovations brought into the Databricks Unified Analytics Platform for Machine Learning, including:

    Get started quickly using the Databricks Runtime 5.0 for Machine Learning, that provides a pre-configured Databricks clusters including the most popular ML frameworks and libraries, Conda support, performance optimizations, and more.

    Track, tune, and manage models, from experimentation to production, with MLflow, an open-source framework for the end-to-end Machine Learning lifecycle that allows data scientists to track experiments, share and reuse projects, and deploy models quickly, locally or in the cloud.

    Scale up deep learning training workloads from a single machine to large clusters for the most demanding applications using the new HorovodRunner.
  • Winning the Audience with AI: Comcast’s Journey to Building an Agile Data and AI Recorded: Sep 5 2019 18 mins
    Databricks & Comcast
    Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers. Over the last couple years, Comcast has transformed the customer experience using machine learning. For example, Comcast uses machine learning to power the X1 voice remote, which was used over 8B times in 2018 by our customers to find something they love to watch, get the latest sports statistics, control their home, or check their bill and troubleshoot their service using natural language.

    What all these different applications have in common is that to create and operate the machine learning models powering these applications we need to ingest many TBs of data on daily basis in an efficient and resilient manner, and need a machine learning platform that allows for fast exploration of new ideas while at the same time automatic deployment of the resulting machine learning models into a production environment that can handle Comcast scale.

    In this talk we describe our data and machine learning infrastructure built on Databricks Unified Analytics Platform including how Databricks Delta is used for the ingest and initial processing of the raw telemetry from our video and voice applications and devices. We then explain how this data can be used by both the product organizations to gain deeper insights into how our products are being used, as well as by our research and engineering teams to train and fuel the machine learning models at the heart of of these products. This keynote will also include an end-to-end demonstration of our machine learning platform that is centered around Databricks and MLFlow and how it integrates with other open source machine learning frameworks such as Tensorflow, PyTorch, Sklearn, H20 and Kubeflow to name a few.
  • How Netflix Data Science Powers Global Entertainment Recorded: Sep 5 2019 23 mins
    Databricks & Netflix
    Netflix is bringing engaging, culturally diverse stories to people all across the globe. With each original movie or TV show, we learn more about what our members want – and rest assured, they want an increasingly broad, deep, global, dynamic library!

    Whether it’s planning how to satisfy these global tastes with the right content portfolio or personalizing recommendations to each member, Netflix relies heavily on data science techniques. This talk will highlight some of our core data science strategies and applications involving predictive models & algorithms, experimentation, and analytics.
  • Hotels.com’s Journey to Becoming an Algorithmic Business Recorded: Sep 5 2019 21 mins
    Databricks & Hotels.com
    In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.
  • Accelerating Machine Learning Recorded: Sep 5 2019 59 mins
    Databricks
    In this webinar, we will cover some of the latest innovations brought into the Databricks Unified Analytics Platform for Machine Learning, including:

    Get started quickly using the Databricks Runtime 5.0 for Machine Learning, that provides a pre-configured Databricks clusters including the most popular ML frameworks and libraries, Conda support, performance optimizations, and more.

    Track, tune, and manage models, from experimentation to production, with MLflow, an open-source framework for the end-to-end Machine Learning lifecycle that allows data scientists to track experiments, share and reuse projects, and deploy models quickly, locally or in the cloud.

    Scale up deep learning training workloads from a single machine to large clusters for the most demanding applications using the new HorovodRunner.
  • Accelerate Innovation by unifying Data and AI Recorded: Jul 31 2019 57 mins
    Databricks
    See how Apple, Finra, FIS, Overstock.com, Hewlett-Packard, Shell, Hotels.com and many others overcame these challenges to connect data science and data engineering using Databricks, the company founded by the original creators of Apache Spark™. The results? Faster performance, scaled data processes, simplified infrastructure, streamlined workflows, and greater collaboration.
  • Introducing MLflow: Infrastructure for a Complete Machine Learning Lifecycle Recorded: Aug 30 2018 54 mins
    Matei Zaharia, Co-Founder and Chief Technologist at Databricks, Denny Lee, Technical Product Marketing Manager at Databricks
    ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.

    In our webinar, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

    We will show how to:
    - Keep track of experiments runs and results across popular frameworks, including TensorFlow, with MLflow Tracking
    - Execute a MLflow Project published on GitHub from the command line or Databricks notebook as well as remotely execute your project on to a Databricks cluster
    - Quickly deploy MLflow Models on-prem or in the cloud and expose them via REST APIs

    Get started now at https://www.mlflow.org/
Making Big Data Simple
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Apache® Spark™ MLlib 2.x: Migrating ML Workloads to DataFrames
  • Live at: Dec 8 2016 6:00 pm
  • Presented by: Joseph K. Bradley and Jules S. Damji
  • From:
Your email has been sent.
or close