Apache Arrow: Delivering high performance data lake access with Hyperrest

Presented by

Sudheesh Katkam

About this talk

Pandas is one of the most popular data analytics frameworks for Python; it supports many data sources including relational databases as long as a compatible ODBC driver exists. However, the ODBC API is designed around a record-centric paradigm, so some processing is required to convert data received from an ODBC source to a Pandas DataFrame. As a result, ODBC access is relatively slow compared to other approaches to data access. Dremio uses Apache Arrow in-memory columnar storage to represent data internally. The Arrow format is very similar to a Pandas DataFrame, and the Apache Arrow project provides fast conversion functions between Arrow and Pandas. Dremio also offers both an ODBC driver for general-purpose tools and an API to return Arrow data directly to Pandas. This webinar explores how the two of them compare. Join this webinar to learn: - How Dremio uses Hyperrest, a new Apache Arrow API. - How to connect Dremio to Pandas using the Arrow API and the ODBC Driver. - Performance comparisons between ODBC and Arrow API. Even if you can't make it, register anyway, and we'll send you the recording.

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (15)
Subscribers (456)