Hi [[ session.user.profile.firstName ]]


  • Date
  • Rating
  • Views
  • How Edmunds.com Leverages Apache® Spark™ on Databricks to Improve Customer Conve How Edmunds.com Leverages Apache® Spark™ on Databricks to Improve Customer Conve Shaun Elliott, Christian Lugo Recorded: Oct 19 2016 60 mins
    Edmunds.com is a leading online car information and shopping marketplace serving nearly 20 million visitors each month to their website. With a 10x growth in data to 100x of TBs in the past for years, their engineering team was looking for ways to increase consumer engagement and conversion by improving the data integrity of Edmunds' website.

    Databricks simplifies the management of their Apache Spark infrastructure while accelerating data exploration at scale. Now they can quickly analyze large datasets to determine the best sources for car data on their website.

    In this webinar, you will learn:

    Why Edmunds.com moved from MapReduce to Databricks for ad hoc data exploration.
    How Databricks democratized data access across teams to improve decision making and feature innovation.
    Best practices for doing ETL and building a robust data pipeline with Databricks.
  • How Omega Point Helps Investors Optimize their Portfolios with Apache® Spark™ on How Omega Point Helps Investors Optimize their Portfolios with Apache® Spark™ on Omer Cedar, CEO, Omega Point and Eran Cedar, CTO, Omega Point Recorded: Aug 18 2016 56 mins
    Omega Point uses big data analytics to enable investment professionals to reduce risk while increasing their returns. Databricks enables Omega Point to uncover performance drivers of investment portfolios using massive volumes of market data. Join us to learn how Omega Point built a next-generation investment analytics platform to isolate critical market signals from noise with a big data architecture built with Apache Spark on Databricks.
  • Databricks' Data Pipelines: Journey and Lessons Learned Databricks' Data Pipelines: Journey and Lessons Learned Burak Yavuz Recorded: Aug 4 2016 58 mins
    With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform.

    In this webinar, we discuss the role and importance of ETL and what are the common features of an ETL pipeline. We will then show how the same ETL fundamentals are applied and (more importantly) simplified within Databricks’ Data pipelines. By utilizing Apache Spark as its foundation, we can simplify our ETL processes using one framework. With Databricks, you can develop your pipeline code in notebooks, create Jobs to productionize your notebooks, and utilize REST APIs to turn all of this into a continuous integration workflow. We will provide tips and tricks of doing ETL with Spark and lessons learned from our pipeline.
  • How DNV GL is removing analytic barriers in the energy industry with Databricks How DNV GL is removing analytic barriers in the energy industry with Databricks Jonathan Farland, Senior Data Scientist, DNV GL and Kyle Pistor, Solutions Engineer, Databricks Recorded: Jul 20 2016 55 mins
    Smart meter sensor data presents tremendous opportunities for the energy industry to better understand their customers and anticipate their needs. With smart meter data, energy industry data analysts and utilities are able to use hourly readouts to gain high resolution insights into energy consumption patterns across structures and customer types, and in addition gain near real time insights into grid operations.

    Join Jonathan Farland, a technical consultant at DNV GL Energy, to learn how this globally renowned energy company is processing data at scale and mining deeper insights by leveraging statistical learning techniques. In this talk, Jon will share how DNV GL is using Apache Spark and Databricks to turn smart meter data into insights to better serve their customers by:

    - Accelerating data processing compared to competing platforms, at times by nearly 100 times faster without incurring additional operational costs.
    - Scaling to any size on-demand while being able to decouple compute and storage resources to minimize operational expense.
    - Eliminating the need to spend time on DevOps, allowing their data scientists and engineers to focus on solving data problems.
  • Better Sales Performance with Databricks Better Sales Performance with Databricks Justin Mills and Anna Holschuh of Yesware Recorded: Jun 23 2016 57 mins
    In this webinar, you will learn how Yesware used Databricks to radically improve the reliability, scalability, and ease of development of Yesware’s Apache Spark data pipeline. Specifically the Yesware team will cover the workflow of taking an idea from the prototyping stage in a Databricks notebook to the final, fully-tested, peer reviewed, and versioned production feature that produces high quality data for Yesware customers on a daily basis.
  • Deep Dive: Apache Spark Memory Management Deep Dive: Apache Spark Memory Management Andrew Or Recorded: Jun 15 2016 43 mins
    Memory management is at the heart of any data-intensive system. Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.
  • Productionizing your Streaming Jobs Productionizing your Streaming Jobs Prakash Chockalingam Recorded: May 26 2016 60 mins
    Apache Spark™ Streaming is one of the most popular stream processing framework that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. In this talk, we will focus on the following aspects of Spark streaming:

    - Motivation and most common use cases for Spark Streaming
    - Common design patterns that emerge from these use cases and tips to avoid common pitfalls while implementing these design patterns
    - Performance Optimization Techniques
  • Enabling Exploratory Analysis of Large Data with Apache Spark and R Enabling Exploratory Analysis of Large Data with Apache Spark and R Hossein Falaki Recorded: May 19 2016 60 mins
    R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is an R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is an R package distributed within Apache Spark. It exposes Spark DataFrames, which was inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets.

    In this webinar, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: the exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.
  • Apache Spark 2.0: Faster, Easier, and Smarter Apache Spark 2.0: Faster, Easier, and Smarter Reynold Xin Recorded: May 5 2016 61 mins
    In this webcast, Reynold Xin from Databricks will be speaking about Apache Spark's new 2.0 major release.

    The major themes for Spark 2.0 are:
    - Unified APIs: Emphasis on building up higher level APIs including the merging of DataFrame and Dataset APIs
    - Structured Streaming: Simplify streaming by building continuous applications on top of DataFrames allow us to unify streaming, interactive, and batch queries.
    - Tungsten Phase 2: Speed up Apache Spark by 10X
  • GraphFrames: DataFrame-based graphs for Apache® Spark™ GraphFrames: DataFrame-based graphs for Apache® Spark™ Joseph Bradley Recorded: Apr 14 2016 53 mins
    GraphFrames bring the power of Apache Spark DataFrames to interactive analytics on graphs.

    Expressive motif queries simplify pattern search in graphs, and DataFrame integration allows seamlessly mixing graph queries with Spark SQL and ML. By leveraging Catalyst and Tungsten, GraphFrames provide scalability and performance. Uniform language APIs expose the full functionality of GraphX to Java and Python users for the first time.

    In this talk, the developers of the GraphFrames package will give an overview, a live demo, and a discussion of design decisions and future plans. This talk will be generally accessible, covering major improvements from GraphX and providing resources for getting started. A running example of analyzing flight delays will be used to explain the range of GraphFrame functionality: simple SQL and graph queries, motif finding, and powerful graph algorithms.

    For experts, this talk will also include a few technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations.

Embed in website or blog