Name: Apache Arrow: Delivering high performance data lake access with Hyperrest
Start: 2020-04-15T18:00:00Z
End: 2020-04-15T18:00:34.000Z
Location: BrightTALK
Rating: 0

Dremio now can significantly shrink your data warehousing costs and complexity.

That’s because our Fall 2020 Release lets you run low-latency, high-concurrency BI queries directly on Amazon S3 and Azure Data Lake Storage (ADLS). And that’s how Dremio drastically reduces—and in some cases, altogether eliminates—the need to copy and move data into proprietary data warehouses.

Now, with Dremio, you no longer have to invest weeks of engineering work to update dashboards or datasets, and your BI users can achieve significantly faster time to insight.

 
Key capabilities in the Dremio Fall 2020 Release include:

Sub-second query response times
 
Support for thousands of concurrent users and queries

Significant query speedup on star schemas
 
One-click Power BI integration with an automatic DirectQuery connection

Dremio Fall Release

Are your data engineering teams spending weeks or even months on tedious ETL, OLAP cubes and BI extracts in order to provision data sets with enough performance for your BI and data science stakeholders?

It is finally possible to shrink that time to minutes with a dramatically different—and simpler—data architecture. What’s needed is a new category of query engine, a data lake engine, that is purpose-built for lightning-fast queries directly against cloud data lake storage like Amazon S3 and Microsoft ADLS.

Meet the co-founder & CPO of Dremio in this live webinar, as he describes how you can increase productivity and efficiency while also reducing time to insight.

Building an Efficient Data Architecture for Maximum Productivity

Cloud data lakes host a rising tide of increasingly diverse datasets and analytics workloads. Elastic resources and flexible data integration make the cloud data lake a powerful complement, and sometimes even an alternative, to the data warehouse. However, enterprises still struggle to manage the performance, data access, governance and cost of their cloud data lakes.

A new category of technology, the cloud data lake engine, has emerged to resolve these challenges and usher in a new era of interactive OLAP. It merges the structural advantages of SQL-oriented data warehouses with the scale and efficiency of cloud object stores. But how do cloud data lake engines work? What does it mean to data engineers, architects and analysts?

Join Eckerson Group and Dremio as we discuss the components of a cloud data lake engine, as well as common benefits, adoption trends and use cases.

The Rise of the Cloud Data Lake Engine: Architecting for Real-Time Queries

The amount of data produced by IoT is expected to reach 4.4 zettabytes in 2020, up from just 0.1 zettabytes in 2013. Of course, the fundamental principle of IoT is making swift, data-driven decisions—all this data is only valuable if it can be analyzed. Enterprises need to collect data from multiple IoT devices and store that data in a data lake with the ultimate goal of analyzing and gaining insights from it. Sounds simple, right?

Unfortunately, setting up a fast and reliable data pipeline that enables enterprises to obtain value from their IoT data can be overwhelmingly complex and costly. Join us to learn from subject matter experts from Microsoft, Software AG and Dremio as we explore these challenges and best practices for addressing them.

What you will learn:

- Strategies for building a scalable and cost-effective data lake architected for large-scale analytics

- Best practices for storing data emitted from IoT devices in a highly-efficient format that’s suitable for analytical queries

- How to run ad-hoc queries as well as more sophisticated analytical queries directly against IoT data stored in the data lake

- How to build a data pipeline that empowers data scientists to aggregate and analyze IoT and business data from multiple sources for maximum insight

Best Practices for Building a Fast and Reliable IoT Data Pipeline

Semantic layers are business representations of an organization’s data assets that help users access and gain value from it using common business terms. While this concept is not brand new, the innovation of the cloud data lake magnifies the potential of semantic layers far beyond its original intent.

Join our webinar to learn how to successfully implement a semantic layer on the data lake to minimize your data pipeline complexities and allow business users to interact with data without needing to know where and how it is physically stored and organized.

Register for this session to learn:
- Common challenges with semantic layers and how to overcome them
- How a semantic layer reduces pipeline complexity
- Best practices to successfully implement a semantic layer on the data lake

How a Self-Service Semantic Layer for Your Data Lake Saves You Money

We have identified the best practices for building a best-in-class data lake. In this webinar, we will explore, advantages of cloud data lakes; Major challenges when building a data lake on the cloud; What a modern data architecture looks like and how to best implement it; How different industries leverage these best practices in real-world scenarios

Build a Best In Class  Data Lake

In order to make raw data available for business users or data scientists to consume, companies often develop complex ETL pipelines in which data is copied many times between systems. These can be hard to maintain and prone to breakage. 
 
Dremio has developed a new approach called data reflections. When used in conjunction with a cost-based optimizer such as Apache Calcite, data reflections can help accelerate queries without the need for data engineers to manually create data copies or data consumers to interact with different materializations of data to achieve the desired performance. 
 
In addition, data reflections provide separation between the logical world, where analysts and data scientists need to curate and transform the model of the data, and the physical world, where data must be physically optimized in order to enable execution engines to respond to queries in real-time.
 
In addition to providing an overview of data reflections and explaining the technological underpinnings, we’ll offer a live demo of Dremio, which uses reflections to automatically accelerate workloads without having to explicitly move or copy data and while affording users the freedom to curate and transform the logical model of the data.

We’ll also talk about how to get started with Dremio, and share a few specific use cases.

Data Reflections

Pandas is one of the most popular data analytics frameworks for Python; it supports many data sources including relational databases as long as a compatible ODBC driver exists. However, the ODBC API is designed around a record-centric paradigm, so some processing is required to convert data received from an ODBC source to a Pandas DataFrame. As a result, ODBC access is relatively slow compared to other approaches to data access.
Dremio uses Apache Arrow in-memory columnar storage to represent data internally. The Arrow format is very similar to a Pandas DataFrame, and the Apache Arrow project provides fast conversion functions between Arrow and Pandas. Dremio also offers both an ODBC driver for general-purpose tools and an API to return Arrow data directly to Pandas. This webinar explores how the two of them compare.
Join this webinar to learn:
- How Dremio uses Hyperrest, a new Apache Arrow API.
- How to connect Dremio to Pandas using the Arrow API and the ODBC Driver.
- Performance comparisons between ODBC and Arrow API.

 Even if you can't make it, register anyway, and we'll send you the recording.

Apache Arrow: Delivering high performance data lake access with Hyperrest

Data Analytics

Data Architecture

Cloud Analytics

Data Lake

Business Analytics

Business Intelligence

Cloud computing has exploded over the past few years, delivering a previously unimagined level of workplace mobility and flexibility. The cloud computing community on BrightTALK is made up of thousands of engaged professionals learning from the latest cloud computing research and resources. Join the community to expand your cloud computing knowledge and have your questions answered in live sessions with industry experts and vendor representatives.

Cloud Computing

Increasing expectations for good governance, effective risk management and complex demands for corporate compliance are presenting a growing challenge for organizations of all sizes. Join industry thought leaders as they provide you with practical advice on how to implement successful risk and compliance management strategies across your organization. Browse risk management resources in the form of interactive webinars and videos and ask questions of expert GRC professionals.

IT Governance, Risk and Compliance

Practicing business intelligence allows your company to transform raw data into sets of insights for targeted business growth. The business intelligence and analytics community on BrightTALK is made up of thousands of data scientists, database administrators, business analysts and other data professionals. Find relevant webinars and videos on business analytics, business intelligence, data analysis and more presented by recognized thought leaders. Join the conversation by participating in live webinars and round table discussions.

Business Intelligence and Analytics

Welcome to the big data and data management community on BrightTALK. Join thousands of data quality engineers, data scientists, database administrators and other professionals to find more information about the hottest topics affecting your data. Subscribe now to learn about efficiently storing, optimizing a complex infrastructure, developing governing policies, ensuring data quality and analyzing data to make better informed decisions. Join the conversation by watching live and on-demand webinars and take the opportunity to interact with top experts and thought leaders in the field.

Big Data and Data Management

As an IT professional, many of the problems you face are multifaceted, complex and don’t lend themselves to simple solutions. The information technology community features useful and free information technology resources. Join to browse thousands of videos and webinars on ITIL best practices, IT security strategy and more presented by leading CTOs, CIOs and other technology experts.

Information Technology

Get powerful insights to run your business. CEOs, CFOs, CMOs, COOs, CTOs and CIOs are among the leaders in this community who connect with peers and experts to grow their businesses.

CxO Strategy

In the business management community, commercial experts will be sharing webinars and videos on an array of subjects. From sales and marketing, to finance, productivity and growth, this content will help you stay up-to-date with the current economic environment and provide timely management insight to drive business growth.

Apache Arrow: Delivering high performance data lake access with Hyperrest

Presented by

About this talk

More from this channel