About this talk

In order to make raw data available for business users or data scientists to consume, companies often develop complex ETL pipelines in which data is copied many times between systems. These can be hard to maintain and prone to breakage. Dremio has developed a new approach called data reflections. When used in conjunction with a cost-based optimizer such as Apache Calcite, data reflections can help accelerate queries without the need for data engineers to manually create data copies or data consumers to interact with different materializations of data to achieve the desired performance. In addition, data reflections provide separation between the logical world, where analysts and data scientists need to curate and transform the model of the data, and the physical world, where data must be physically optimized in order to enable execution engines to respond to queries in real-time. In addition to providing an overview of data reflections and explaining the technological underpinnings, we’ll offer a live demo of Dremio, which uses reflections to automatically accelerate workloads without having to explicitly move or copy data and while affording users the freedom to curate and transform the logical model of the data. We’ll also talk about how to get started with Dremio, and share a few specific use cases.

Data Reflections

Presented by

About this talk

More from this channel