Pandas is one of the most popular data analytics frameworks for Python; it supports many data sources including relational databases as long as a compatible ODBC driver exists. However, the ODBC API is designed around a record-centric paradigm, so some processing is required to convert data received from an ODBC source to a Pandas DataFrame. As a result, ODBC access is relatively slow compared to other approaches to data access.
Dremio uses Apache Arrow in-memory columnar storage to represent data internally. The Arrow format is very similar to a Pandas DataFrame, and the Apache Arrow project provides fast conversion functions between Arrow and Pandas. Dremio also offers both an ODBC driver for general-purpose tools and an API to return Arrow data directly to Pandas. This webinar explores how the two of them compare.
Join this webinar to learn:
- How Dremio uses Hyperrest, a new Apache Arrow API.
- How to connect Dremio to Pandas using the Arrow API and the ODBC Driver.
- Performance comparisons between ODBC and Arrow API.
Even if you can't make it, register anyway, and we'll send you the recording.