Just-in-Time Data Warehousing on Databricks: CDC and Schema On Read

Presented by

Jason Pohl

About this talk

In this webcast, Jason Pohl, Solution Engineer from Databricks, will cover how to build a Just-in-Time Data Warehouse on Databricks with a focus on performing Change Data Capture from a relational database and joining that data to a variety of data sources. Not only does Apache Spark and Databricks allow you to do this easier with less code, the routine will automatically ingest changes to the source schema. Highlights of this webinar include: 1. Starting with a Databricks notebook, Jason will build a classic Change Data Capture (CDC) ETL routine to extract data from an RDBMS. 2. A deep-dive into selecting a delta of changes from tables in an RDBMS, writing it to Parquet, querying it using Spark SQL. 3. Demonstrate how to apply a schema at time of read rather than before write

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (75)
Subscribers (38430)
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership.