Deploying Machine Learning Techniques at Petabyte Scale with Databricks

Logo
Presented by

Saket Mengle, Senior Principal Data Scientist at DataXu

About this talk

The central premise of DataXu is to apply data science to better marketing. At its core, is the Real-time Bidding Platform that processes 2 petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is DataXu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This webinar will showcase DataXu’s successful use-cases of using the Apache® Spark™ framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. We will also discuss in detail: Challenges of using Apache Spark in a petabyte scale machine learning system and how we worked to solve the issues. Best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (92)
Subscribers (39062)
No matter at what stage of your data journey you’re in, this channel will help you get a better understanding of the fundamental concepts of the Databricks Lakehouse platform and the problems we’re helping to solve for data teams.