Best Practices: How To Build Scalable Data Pipelines for Machine Learning

Logo
Presented by

Jorge Villamariona and Pradeep Reddy, Qubole

About this talk

Data engineers today serve a wider audience than just a few years ago. Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data engineers is the need to build and fill Data Lakes as well as reliably delivering complete large-volume data sets so that data scientists can train more accurate models. Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Qubole addresses these challenges by providing an auto-scaling cloud-native platform to build and run these data pipelines. In this webinar we will cover: - Some of the typical challenges faced by data engineers when building pipelines for machine learning. - Typical uses of the various Qubole engines to address these challenges. - Real-world customer examples

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (118)
Subscribers (8242)
Tune in to hear from open data lake platform leaders and engineers discuss everything from continuous date engineering on data lakes for machine learning, streaming analytics, ad-hoc analytics and data exploration in the cloud. The interactive talks are designed for both data engineers, data analysts and data scientists that want to learn about some of the challenges and solutions for use cases seen in data-driven organizations. Learn more about Qubole: http://bit.ly/AboutQubole