Just-in-Time Data Warehousing on Databricks: CDC and Schema On Read

Logo
Presented by

Jason Pohl

About this talk

In this webcast, Jason Pohl, Solution Engineer from Databricks, will cover how to build a Just-in-Time Data Warehouse on Databricks with a focus on performing Change Data Capture from a relational database and joining that data to a variety of data sources. Not only does Apache Spark and Databricks allow you to do this easier with less code, the routine will automatically ingest changes to the source schema. Highlights of this webinar include: 1. Starting with a Databricks notebook, Jason will build a classic Change Data Capture (CDC) ETL routine to extract data from an RDBMS. 2. A deep-dive into selecting a delta of changes from tables in an RDBMS, writing it to Parquet, querying it using Spark SQL. 3. Demonstrate how to apply a schema at time of read rather than before write

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (79)
Subscribers (38496)
No matter at what stage of your data journey you’re in, this channel will help you get a better understanding of the fundamental concepts of the Databricks Lakehouse platform and the problems we’re helping to solve for data teams.