Hi [[ session.user.profile.firstName ]]

Databricks - The Unified Analytics Platform

  • Date
  • Rating
  • Views
  • Introducing MLflow: Infrastructure for a Complete Machine Learning Lifecycle
    Introducing MLflow: Infrastructure for a Complete Machine Learning Lifecycle Matei Zaharia, Co-Founder and Chief Technologist at Databricks, Denny Lee, Technical Product Marketing Manager at Databricks Recorded: Aug 30 2018 54 mins
    ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.

    In our webinar, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

    We will show how to:
    - Keep track of experiments runs and results across popular frameworks, including TensorFlow, with MLflow Tracking
    - Execute a MLflow Project published on GitHub from the command line or Databricks notebook as well as remotely execute your project on to a Databricks cluster
    - Quickly deploy MLflow Models on-prem or in the cloud and expose them via REST APIs

    Get started now at https://www.mlflow.org/
  • From Data Prep to Deep Learning: How HP Unifies Analytics with Databricks
    From Data Prep to Deep Learning: How HP Unifies Analytics with Databricks Franco Vieira, Data Scientist at HP Recorded: Jul 31 2018 47 mins
    HP has invested into a new product delivery paradigm called Device as a Service (DaaS). Success of the DaaS investment depends on automating the delivery, monitoring, replacement, user interaction, and servicing of the device. At the core of DaaS is a set of Virtual Assistants optimizing the cost and the use experience assuring customer satisfaction with an aggressive cost model. Key takeaways from this presentation is how HP is using the Databricks Unified Analytics Platform to develop Virtual Assistants to change the workplace. Additionally, John will cover HP's approach to developing AI on Apache Spark™ and why HP chose Spark as a core technology for AI.
  • Scalable End-to-End Deep Learning using TensorFlow™ and Databricks
    Scalable End-to-End Deep Learning using TensorFlow™ and Databricks Brooke Wenig, Data Science Solutions Consultant at Databricks, Siddarth Murching, Software Engineer at Databricks Recorded: Jul 9 2018 42 mins
    Deep Learning has shown tremendous success, and as we all know, the more data the better the models. However, we eventually hit a bottleneck on how much data we can process on a single machine. This necessitates a new way of training neural networks: in a distributed manner.

    In this webinar, we walk through how to use TensorFlow™ and Horovod (an open-source library from Uber to simplify distributed model training) on Databricks to build a more effective recommendation system at scale. We will cover:

    - The new Databricks Runtime for ML, shipped with pre-installed libraries such as Keras, Tensorflow, Horovod, and XGBoost to enable data scientists to get started with distributed Machine Learning more quickly
    - The newly-released HorovodEstimator API for distributed, multi-GPU training of deep learning models against data in Apache Spark™
    - How to make predictions at scale with deep learning pipelines
  • Streaming Analytics Use Cases on Apache Spark™
    Streaming Analytics Use Cases on Apache Spark™ Deepsha Menghani, Prod Mktg Mgr at Microsoft; Dhruv Kumar, Solutions Architect; Brian Dirking, Sr. Dir of Partner Mktg Recorded: May 17 2018 60 mins
    Real time analytics are crucial to many use cases. Apache Spark™ provides the framework and high volume analytics to provide answers from your streaming data. Join us in this webinar and see a demonstration of how to build IoT and Clickstream Analytics Notebooks in Azure Databricks. These Notebooks will use Python and SQL code to capture data from Azure Events Hub and Azure IoT Hub, parse the data, and make it available to run in machine learning models. See how your organization can start taking advantage of your streaming data.
  • Is Your Data Lake GDPR Ready? How to Avoid Drowning in Data Requests
    Is Your Data Lake GDPR Ready? How to Avoid Drowning in Data Requests Arsalan Tavakoli-Shiraji, VP of Solutions; Justin Olsson, Senior Legal Counsel and Michael Armbrust, Software Engineer Recorded: May 9 2018 41 mins
    With GDPR enforcement rapidly approaching on May 25, many companies are still trying to figure out how to comply with one of the regulation’s biggest pain points - data subject requests (DSRs). Under GDPR, data subjects (individuals) in the EU have the right to request information on what personal data is collected, how it is being used, and to have that data changed or erased.

    For many organizations that rely on data lakes to store their big data, sifting through millions of files to locate and modify records for a DSR is at minimum a massive effort. And trying to do this within prescribed timelines is near impossible.

    Fortunately there’s a path forward. Through an optimized approach to data management, Databricks powered by Apache Spark™ makes it easy to quickly find, edit and erase data submerged deep within your data lake without disrupting your data pipelines.

    Join this webinar to learn:
    • The GDPR requirements of data subject requests
    • The challenges big data and data lakes create for organizations
    • How Databricks Delta, a powerful new offering within the Databricks Unified Analytics Platform improves data lake management and makes it possible to quickly find and surgically remove or modify individual records
    • Best practices for GDPR data governance
    • Live demo on how to easily fulfill data requests with Databricks
  • Collaboration to Production with Apache Spark on Azure Databricks
    Collaboration to Production with Apache Spark on Azure Databricks Sandy May, Data Shepherd at Elastacloud Recorded: Apr 27 2018 54 mins
    Sandy is going to highlight some key aspects of the new Spark-as-a-Service offering in Azure, from Databricks. Leveraging the power of Databricks notebooks to showcase loading and cleaning data in SQL and Scala, exploration and all the way through to having a model into production.
  • Apache Spark™ for Machine Learning and AI
    Apache Spark™ for Machine Learning and AI Brian Dirking, Senior Director of Partner Marketing at Databricks, and Nauman Fakhar, System Architect at Databricks Recorded: Apr 26 2018 61 mins
    Azure Databricks in an Apache Spark™ based platform, providing the scale, collaborative platform, and integration with your Azure environment that makes it the best place to run your ML and AI workloads on Azure. This webinar will include an in-depth demo of key AI and ML use cases.
  • How Viacom Revolutionized Audience Experiences with Real-Time Analytics at Scale
    How Viacom Revolutionized Audience Experiences with Real-Time Analytics at Scale Mark Cohen, VP of Data Platform Engineering at Viacom; Chris Burns, Machine Learning Solutions Architect at AWS Recorded: Apr 25 2018 59 mins
    With 170+ global networks, Viacom is focused on providing an amazing audience experience to its billions of viewers around the world. Core to this strategy is leveraging big data and advanced analytics to offer the right content to the right audience and deliver it flawlessly on any device. To make this possible, Viacom set-out to build a real-time, scalable data analytics platform on Apache Spark™.

    Join this webinar to learn how Viacom overcame the complexities of Spark with Databricks and AWS to build an end-to-end scalable self-service insights platform that delivers on a wide range of analytics use cases.

    This webinar will cover:
    - The challenges Viacom faced building a scalable, real-time data insights and AI platform
    - How they overcame these challenges with Spark, AWS and Databricks
    - How they leverage a unified analytics platform for data pipelines, analytics and machine learning to reduce video start delays and improve content delivery with stream analytics at scale
    - What it takes to create a data driven culture with self-service analytics that meet the needs of business users, data analysts and data scientists
  • Getting Started with Apache Spark™ on Azure Databricks
    Getting Started with Apache Spark™ on Azure Databricks Brian Dirking, Senior Director of Partner Marketing at Databricks, and Nauman Fakhar, System Architect at Databricks Recorded: Mar 27 2018 60 mins
    Learn the basics of Apache Spark™ on Azure Databricks. Designed by Databricks, in collaboration with Microsoft, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

    This webinar will cover the following topics:

    · RDDs, DataFrames, Datasets, and other fundamentals of Apache Spark.
    · How to quickly setup Azure Databricks, relieving you of DataOps duties.
    · How to use the Databricks interactive notebooks, which provide a collaborative space for your entire analytics team, and how you can schedule notebooks, immediately putting your work into production.
  • Fast and Reliable ETL Pipelines with Databricks
    Fast and Reliable ETL Pipelines with Databricks Prakash Chockalingam, Product Manager at Databricks Recorded: Mar 7 2018 57 mins
    Building multiple ETL pipelines is very complex and time consuming, making it a very expensive endeavor. As the number of data sources and the volume of the data increases, the ETL time also increases, negatively impacting when an enterprise can derive value from the data.

    Join Prakash Chockalingam, Product Manager and data engineering expert at Databricks, to learn how to avoid the common pitfalls of data engineering and how the Databricks Unified Analytics Platform can ensure performance and reliability at scale to lower total cost of ownership (TCO).

    In this webinar, you will learn how Databricks can help to:
    - Remove infrastructure configuration complexity to reduce DevOps efforts
    - Optimize your ETL data pipelines for performance without compromising reliability
    - Unify data engineering and data science to accelerate innovation for the business.

Embed in website or blog