Lakehouse: Smart Iceberg Table Optimizer

Logo
Presented by

Rajasekhar Konda, Steve Zhang

About this talk

The new lakehouse architectural design pattern provides many technical benefits like ACID support, time travel for machine learning, and better query performance. Apache Iceberg implements this pattern and provides the flexibility to further enhance based on real-world needs. Using Apache Iceberg table format requires special vacuuming like snapshot expire and orphan removal for data governance and metadata and data compaction for efficient and fast access of data. This video will discuss how to handle these table operations at very large scale by keeping cost in mind without compromising on data engineering, ML, analytical, and BI use cases. Automating these operations makes life easier for engineers leveraging the platforms without having to worry about how Iceberg internals work. We will share lessons learned for optimizing the streaming and batch data sets in a cost-effective and efficient way.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (103)
Subscribers (4445)
Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Dremio increases agility with a revolutionary data-as-code approach that enables Git-like data experimentation, version control, and governance.