Hi [[ session.user.profile.firstName ]]

Databricks

  • Date
  • Rating
  • Views
  • How Databricks and Machine Learning is Powering the Future of Genomics
    How Databricks and Machine Learning is Powering the Future of Genomics Frank Austin Nothaft, Genomics Data Engineer at Databricks Recorded: May 25 2017 59 mins
    With the drastic drop in the cost of sequencing a single genome, many organizations across biotechnology, pharmaceuticals, biomedical research, and agriculture have begun to make use of genome sequencing. While the sequence of a single genome may provide insight about the individual who was sequenced, to derive maximal insight from the genomic data, the ultimate goal is to query across a cohort of many hundreds to thousands of individuals.

    Join this webinar to learn how Databricks — powered by Apache Spark — enables queries across a database of genomics in interactive time and simplifies the application of machine learning models and statistical tests to genomics data across patients, to derive more insight into the biological processes driven by genomic alterations.

    In this webinar, we will:

    - Demonstrate how Databricks can rapidly query annotated variants across a cohort of 1,000 samples.
    - Look at a case study using Databricks to improve the performance of running an expression quantitative trait loci (eQTL) test across samples from the GEUVADIS project.
    - Show how we can parallelize conventional genomics tools using Databricks.
  • Deploying Machine Learning Techniques at Petabyte Scale with Databricks
    Deploying Machine Learning Techniques at Petabyte Scale with Databricks Saket Mengle, Senior Principal Data Scientist at DataXu Recorded: May 22 2017 61 mins
    The central premise of DataXu is to apply data science to better marketing. At its core, is the Real-time Bidding Platform that processes 2 petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is DataXu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics.

    This webinar will showcase DataXu’s successful use-cases of using the Apache® Spark™ framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production.

    We will also discuss in detail:

    Challenges of using Apache Spark in a petabyte scale machine learning system and how we worked to solve the issues.
    Best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
  • Deep Learning on Apache® Spark™: Workflows and Best Practices
    Deep Learning on Apache® Spark™: Workflows and Best Practices Tim Hunter and Jules S. Damji Recorded: May 4 2017 47 mins
    The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.

    Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:

    * optimizing cluster setup;
    * configuring the cluster;
    * ingesting data; and
    * monitoring long-running jobs.

    We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.

    Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
  • Databricks Product Demonstration
    Databricks Product Demonstration Don Hilborn Recorded: Apr 19 2017 48 mins
    This is a live demonstration of the Databricks virtual analytics platform.
  • How to Increase Data Science Agility at Scale with Databricks
    How to Increase Data Science Agility at Scale with Databricks Maddie Schults Recorded: Mar 30 2017 51 mins
    Apache® Spark™ has become an indispensable tool for data science teams. Its performance and flexibility enables data scientists to do everything from interactive exploration, feature engineering, to model tuning with ease. In this webinar, Maddie Schults - Databricks product manager - will discuss how Databricks allows data science teams to use Apache Spark for their day-to-day work.

    You will learn:

    - Obstacles faced by data science teams in the era of big data;
    - How Databricks simplifies Spark development;
    - A demonstration of key Databricks functionalities that help data scientists become more productive.
  • Databricks Product Demonstration
    Databricks Product Demonstration Miklos Christine Recorded: Mar 29 2017 63 mins
    This is a live demonstration of the Databricks virtual analytics platform.
  • Databricks Product Demonstration
    Databricks Product Demonstration Jason Pohl Recorded: Mar 15 2017 45 mins
    This is a live demonstration of the Databricks virtual analytics platform.
  • Apache® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
    Apache® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models Richard Garris and Jules S. Damji Recorded: Mar 9 2017 61 mins
    Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment? How do I embed what I have learned into customer facing data applications?

    In this webinar, we will discuss best practices from Databricks on how our customers productionize machine learning models, do a deep dive with actual customer case studies, and show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.
  • How Smartsheet operationalized Apache Spark with Databricks
    How Smartsheet operationalized Apache Spark with Databricks Francis Lau, Senior Director, Product Intelligence at Smartsheet Recorded: Feb 23 2017 61 mins
    Apache Spark is red hot, but without the compulsory skillsets, it can be a challenge to operationalize — making it difficult to build a robust production data pipeline that business users and data scientists across your company can use to unearth insights.

    Smartsheet is the world’s leading SaaS platform for managing and automating collaborative work. With over 90,000 companies and millions of users, it helps teams get work done ranging from managing simple task lists to orchestrating the largest sporting events and construction projects.

    In this webinar, you will learn how Smartsheet uses Databricks to overcome the complexities of Spark to build their own analysis platform that enables self-service insights at will, scale, and speed to better understand their customers’ diverse use cases. They will share valuable patterns and lessons learned in both technical and adoption areas to show how they achieved this, including:

    How to build a robust metadata-driven data pipeline that processes application and business systems data to provide a 360 view of customers and to drive smarter business systems integrations.
    How to provide an intuitive and valuable “pyramid” of datasets usable by both technical and business users.
    Their roll-out approach and materials used for company-wide adoption allowing users to go from zero to insights with Spark and Databricks in minutes.
  • Apache® Spark™ - The Unified Engine for All Workloads
    Apache® Spark™ - The Unified Engine for All Workloads Tony Baer, Principal Analyst at Ovum Recorded: Jan 12 2017 63 mins
    The Apache® Spark™ compute engine has gone viral – not only is it the most active Apache big data open source project, but it is also the fastest growing big data analytics workload, on and off Hadoop. The major reason behind Spark’s popularity with developers and enterprises is its flexibility to support a wide range of workloads including SQL query, machine learning, streaming, and graph analysis.


    This webinar features Ovum analyst Tony Baer, who will explain the real-world benefits to practitioners and enterprises when they build a technology stack based on a unified approach with Apache Spark.

    This webinar will cover:
    Findings around the growth of Spark and diverse applications using machine learning and streaming.
    The advantages of using Spark to unify all workloads, rather than stitching together many specialized engines like Presto, Storm, MapReduce, Pig, and others.
    Use case examples that illustrate the flexibility of Spark in supporting various workloads.

Embed in website or blog