In this webcast, Reynold Xin from Databricks will be speaking about Apache Spark's new 2.0 major release.
The major themes for Spark 2.0 are:
- Unified APIs: Emphasis on building up higher level APIs including the merging of DataFrame and Dataset APIs
- Structured Streaming: Simplify streaming by building continuous applications on top of DataFrames allow us to unify streaming, interactive, and batch queries.
- Tungsten Phase 2: Speed up Apache Spark by 10X
RecordedMay 5 201661 mins
Your place is confirmed, we'll send you email reminders
Nauman Fakhar, Solutions Architect at Databricks Brett Olmstead, Data Scientist at DataRobot
On-Demand Webinar
Creating and testing machine learning models is time consuming, and delivering results in production at high volumes is prone to error. Automating the development of models is not only helping companies reduce errors but it’s also uncovering powerful business insights - faster.
Join our upcoming webinar, How to Automate Machine Learning and Scale Delivery, to learn how to:
Use automation to dynamically select optimal machine learning models for your use case
Automate the process of deploying models to production with high-volume data pipelines
Configure your automation to quickly scale up and down as data volumes change
Kevin Clugage Sr Director, Partner Marketing, DatabricksAnna Shrestinian Product Manager, Databricks
Azure Databricks is a Unified Analytics Platform built with a security-first mindset that enables you to run analytics and Machine Learning workloads at scale without compromising on security.
Join Anna Shrestinian, Product Manager at Databricks, and Kevin Clugage, Sr Director of Partner Marketing, as they describes how a number of Azure-specific features fit into the Databricks model for data security, and how to utilize them with best practices designed to make it easier to manage and operate a secure environment. We will also feature a live demo of Azure Databricks to see a few concrete examples of these capabilities in action.
In this webinar, you will learn:
How to use Platform Security features for networking and storage with Azure Databricks such as VNET Injection, No Public IPs and Encryption.
Deploy, Operate and Govern at Scale for Authentication and Authorization with Azure Databricks using Azure Active Directory single sign-on, Azure Data Lake Storage Gen2 credential passthrough and integration with Azure Key Vault.
Denny Lee, Global Developer Advocate at Databricks
Deep Learning has shown tremendous success, but what makes it so special? What are neural networks, and how do they work? What are the differences between popular Deep Learning frameworks like Keras or TensorFlow, and where should you start?
In this webinar, we will show the relationship between neural networks and simpler ML models, and understand what gives neural networks their expressive power.
We will bring these concepts to life and demonstrate how Databricks simplifies deep learning by letting you quickly access ready to use ML environments as well as prepare data, and train models faster.
About the presenter:
Denny Lee is a Global Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers.
See how Apple, Finra, FIS, Overstock.com, Hewlett-Packard, Shell, Hotels.com and many others overcame these challenges to connect data science and data engineering using Databricks, the company founded by the original creators of Apache Spark™. The results? Faster performance, scaled data processes, simplified infrastructure, streamlined workflows, and greater collaboration.
Netflix is bringing engaging, culturally diverse stories to people all across the globe. With each original movie or TV show, we learn more about what our members want – and rest assured, they want an increasingly broad, deep, global, dynamic library!
Whether it’s planning how to satisfy these global tastes with the right content portfolio or personalizing recommendations to each member, Netflix relies heavily on data science techniques. This talk will highlight some of our core data science strategies and applications involving predictive models & algorithms, experimentation, and analytics.
In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.
Denny Lee, Global Developer Advocate at Databricks
In this Deep Learning Fundamental Series Part 2, we will cover the principles for training your neural network including activation and loss functions, batch sizes, data normalization, and validation datasets.
All these concepts will be brought to life by demonstrating how Databricks simplifies deep learning - letting you quickly access ready-to-use ML environments, as well as prepare data, and train models faster. After this session, if requested, you will receive the presentation and associated notebooks so you can run the samples yourself.
Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers. Over the last couple years, Comcast has transformed the customer experience using machine learning. For example, Comcast uses machine learning to power the X1 voice remote, which was used over 8B times in 2018 by our customers to find something they love to watch, get the latest sports statistics, control their home, or check their bill and troubleshoot their service using natural language.
What all these different applications have in common is that to create and operate the machine learning models powering these applications we need to ingest many TBs of data on daily basis in an efficient and resilient manner, and need a machine learning platform that allows for fast exploration of new ideas while at the same time automatic deployment of the resulting machine learning models into a production environment that can handle Comcast scale.
In this talk we describe our data and machine learning infrastructure built on Databricks Unified Analytics Platform including how Databricks Delta is used for the ingest and initial processing of the raw telemetry from our video and voice applications and devices. We then explain how this data can be used by both the product organizations to gain deeper insights into how our products are being used, as well as by our research and engineering teams to train and fuel the machine learning models at the heart of of these products. This keynote will also include an end-to-end demonstration of our machine learning platform that is centered around Databricks and MLFlow and how it integrates with other open source machine learning frameworks such as Tensorflow, PyTorch, Sklearn, H20 and Kubeflow to name a few.
In this webinar, we will cover some of the latest innovations brought into the Databricks Unified Analytics Platform for Machine Learning, including:
Get started quickly using the Databricks Runtime 5.0 for Machine Learning, that provides a pre-configured Databricks clusters including the most popular ML frameworks and libraries, Conda support, performance optimizations, and more.
Track, tune, and manage models, from experimentation to production, with MLflow, an open-source framework for the end-to-end Machine Learning lifecycle that allows data scientists to track experiments, share and reuse projects, and deploy models quickly, locally or in the cloud.
Scale up deep learning training workloads from a single machine to large clusters for the most demanding applications using the new HorovodRunner.
Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers. Over the last couple years, Comcast has transformed the customer experience using machine learning. For example, Comcast uses machine learning to power the X1 voice remote, which was used over 8B times in 2018 by our customers to find something they love to watch, get the latest sports statistics, control their home, or check their bill and troubleshoot their service using natural language.
What all these different applications have in common is that to create and operate the machine learning models powering these applications we need to ingest many TBs of data on daily basis in an efficient and resilient manner, and need a machine learning platform that allows for fast exploration of new ideas while at the same time automatic deployment of the resulting machine learning models into a production environment that can handle Comcast scale.
In this talk we describe our data and machine learning infrastructure built on Databricks Unified Analytics Platform including how Databricks Delta is used for the ingest and initial processing of the raw telemetry from our video and voice applications and devices. We then explain how this data can be used by both the product organizations to gain deeper insights into how our products are being used, as well as by our research and engineering teams to train and fuel the machine learning models at the heart of of these products. This keynote will also include an end-to-end demonstration of our machine learning platform that is centered around Databricks and MLFlow and how it integrates with other open source machine learning frameworks such as Tensorflow, PyTorch, Sklearn, H20 and Kubeflow to name a few.
Netflix is bringing engaging, culturally diverse stories to people all across the globe. With each original movie or TV show, we learn more about what our members want – and rest assured, they want an increasingly broad, deep, global, dynamic library!
Whether it’s planning how to satisfy these global tastes with the right content portfolio or personalizing recommendations to each member, Netflix relies heavily on data science techniques. This talk will highlight some of our core data science strategies and applications involving predictive models & algorithms, experimentation, and analytics.
In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.
In this webinar, we will cover some of the latest innovations brought into the Databricks Unified Analytics Platform for Machine Learning, including:
Get started quickly using the Databricks Runtime 5.0 for Machine Learning, that provides a pre-configured Databricks clusters including the most popular ML frameworks and libraries, Conda support, performance optimizations, and more.
Track, tune, and manage models, from experimentation to production, with MLflow, an open-source framework for the end-to-end Machine Learning lifecycle that allows data scientists to track experiments, share and reuse projects, and deploy models quickly, locally or in the cloud.
Scale up deep learning training workloads from a single machine to large clusters for the most demanding applications using the new HorovodRunner.
See how Apple, Finra, FIS, Overstock.com, Hewlett-Packard, Shell, Hotels.com and many others overcame these challenges to connect data science and data engineering using Databricks, the company founded by the original creators of Apache Spark™. The results? Faster performance, scaled data processes, simplified infrastructure, streamlined workflows, and greater collaboration.
Matei Zaharia, Co-Founder and Chief Technologist at Databricks, Denny Lee, Technical Product Marketing Manager at Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In our webinar, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
We will show how to:
- Keep track of experiments runs and results across popular frameworks, including TensorFlow, with MLflow Tracking
- Execute a MLflow Project published on GitHub from the command line or Databricks notebook as well as remotely execute your project on to a Databricks cluster
- Quickly deploy MLflow Models on-prem or in the cloud and expose them via REST APIs
HP has invested into a new product delivery paradigm called Device as a Service (DaaS). Success of the DaaS investment depends on automating the delivery, monitoring, replacement, user interaction, and servicing of the device. At the core of DaaS is a set of Virtual Assistants optimizing the cost and the use experience assuring customer satisfaction with an aggressive cost model. Key takeaways from this presentation is how HP is using the Databricks Unified Analytics Platform to develop Virtual Assistants to change the workplace. Additionally, John will cover HP's approach to developing AI on Apache Spark™ and why HP chose Spark as a core technology for AI.
Brooke Wenig, Data Science Solutions Consultant at Databricks, Siddarth Murching, Software Engineer at Databricks
Deep Learning has shown tremendous success, and as we all know, the more data the better the models. However, we eventually hit a bottleneck on how much data we can process on a single machine. This necessitates a new way of training neural networks: in a distributed manner.
In this webinar, we walk through how to use TensorFlow™ and Horovod (an open-source library from Uber to simplify distributed model training) on Databricks to build a more effective recommendation system at scale. We will cover:
- The new Databricks Runtime for ML, shipped with pre-installed libraries such as Keras, Tensorflow, Horovod, and XGBoost to enable data scientists to get started with distributed Machine Learning more quickly
- The newly-released HorovodEstimator API for distributed, multi-GPU training of deep learning models against data in Apache Spark™
- How to make predictions at scale with deep learning pipelines
Deepsha Menghani, Prod Mktg Mgr at Microsoft; Dhruv Kumar, Solutions Architect; Brian Dirking, Sr. Dir of Partner Mktg
Real time analytics are crucial to many use cases. Apache Spark™ provides the framework and high volume analytics to provide answers from your streaming data. Join us in this webinar and see a demonstration of how to build IoT and Clickstream Analytics Notebooks in Azure Databricks. These Notebooks will use Python and SQL code to capture data from Azure Events Hub and Azure IoT Hub, parse the data, and make it available to run in machine learning models. See how your organization can start taking advantage of your streaming data.
Arsalan Tavakoli-Shiraji, VP of Solutions; Justin Olsson, Senior Legal Counsel and Michael Armbrust, Software Engineer
With GDPR enforcement rapidly approaching on May 25, many companies are still trying to figure out how to comply with one of the regulation’s biggest pain points - data subject requests (DSRs). Under GDPR, data subjects (individuals) in the EU have the right to request information on what personal data is collected, how it is being used, and to have that data changed or erased.
For many organizations that rely on data lakes to store their big data, sifting through millions of files to locate and modify records for a DSR is at minimum a massive effort. And trying to do this within prescribed timelines is near impossible.
Fortunately there’s a path forward. Through an optimized approach to data management, Databricks powered by Apache Spark™ makes it easy to quickly find, edit and erase data submerged deep within your data lake without disrupting your data pipelines.
Join this webinar to learn:
• The GDPR requirements of data subject requests
• The challenges big data and data lakes create for organizations
• How Databricks Delta, a powerful new offering within the Databricks Unified Analytics Platform improves data lake management and makes it possible to quickly find and surgically remove or modify individual records
• Best practices for GDPR data governance
• Live demo on how to easily fulfill data requests with Databricks
Sandy is going to highlight some key aspects of the new Spark-as-a-Service offering in Azure, from Databricks. Leveraging the power of Databricks notebooks to showcase loading and cleaning data in SQL and Scala, exploration and all the way through to having a model into production.
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership.