Hi [[ session.user.profile.firstName ]]

Build a Best In Class Data Lake

We have identified the best practices for building a best-in-class data lake. In this webinar, we will explore, advantages of cloud data lakes; Major challenges when building a data lake on the cloud; What a modern data architecture looks like and how to best implement it; How different industries leverage these best practices in real-world scenarios
Recorded Jun 4 2020 36 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Lucio Daza, Scott Gay
Presentation preview: Build a Best In Class  Data Lake

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Dremio Fall Release Recorded: Mar 19 2021 58 mins
    Tomer Shiran, Co-founder, CPO, Dremio and Tom Fry, Director of Product Management, Dremio
    Dremio now can significantly shrink your data warehousing costs and complexity.

    That’s because our Fall 2020 Release lets you run low-latency, high-concurrency BI queries directly on Amazon S3 and Azure Data Lake Storage (ADLS). And that’s how Dremio drastically reduces—and in some cases, altogether eliminates—the need to copy and move data into proprietary data warehouses.

    Now, with Dremio, you no longer have to invest weeks of engineering work to update dashboards or datasets, and your BI users can achieve significantly faster time to insight.

    Key capabilities in the Dremio Fall 2020 Release include:

    Sub-second query response times

    Support for thousands of concurrent users and queries

    Significant query speedup on star schemas

    One-click Power BI integration with an automatic DirectQuery connection
  • Building an Efficient Data Architecture for Maximum Productivity Recorded: Aug 5 2020 59 mins
    Tomer Shiran
    Are your data engineering teams spending weeks or even months on tedious ETL, OLAP cubes and BI extracts in order to provision data sets with enough performance for your BI and data science stakeholders?

    It is finally possible to shrink that time to minutes with a dramatically different—and simpler—data architecture. What’s needed is a new category of query engine, a data lake engine, that is purpose-built for lightning-fast queries directly against cloud data lake storage like Amazon S3 and Microsoft ADLS.

    Meet the co-founder & CPO of Dremio in this live webinar, as he describes how you can increase productivity and efficiency while also reducing time to insight.
  • The Rise of the Cloud Data Lake Engine: Architecting for Real-Time Queries Recorded: Jul 29 2020 55 mins
    Jason Nadeau, Wayne Eckerson, Kevin Petrie
    Cloud data lakes host a rising tide of increasingly diverse datasets and analytics workloads. Elastic resources and flexible data integration make the cloud data lake a powerful complement, and sometimes even an alternative, to the data warehouse. However, enterprises still struggle to manage the performance, data access, governance and cost of their cloud data lakes.

    A new category of technology, the cloud data lake engine, has emerged to resolve these challenges and usher in a new era of interactive OLAP. It merges the structural advantages of SQL-oriented data warehouses with the scale and efficiency of cloud object stores. But how do cloud data lake engines work? What does it mean to data engineers, architects and analysts?

    Join Eckerson Group and Dremio as we discuss the components of a cloud data lake engine, as well as common benefits, adoption trends and use cases.
  • Best Practices for Building a Fast and Reliable IoT Data Pipeline Recorded: Jun 30 2020 58 mins
    Ryan Murray, Chris Furlong, Jeff King
    The amount of data produced by IoT is expected to reach 4.4 zettabytes in 2020, up from just 0.1 zettabytes in 2013. Of course, the fundamental principle of IoT is making swift, data-driven decisions—all this data is only valuable if it can be analyzed. Enterprises need to collect data from multiple IoT devices and store that data in a data lake with the ultimate goal of analyzing and gaining insights from it. Sounds simple, right?

    Unfortunately, setting up a fast and reliable data pipeline that enables enterprises to obtain value from their IoT data can be overwhelmingly complex and costly. Join us to learn from subject matter experts from Microsoft, Software AG and Dremio as we explore these challenges and best practices for addressing them.

    What you will learn:

    - Strategies for building a scalable and cost-effective data lake architected for large-scale analytics

    - Best practices for storing data emitted from IoT devices in a highly-efficient format that’s suitable for analytical queries

    - How to run ad-hoc queries as well as more sophisticated analytical queries directly against IoT data stored in the data lake

    - How to build a data pipeline that empowers data scientists to aggregate and analyze IoT and business data from multiple sources for maximum insight
  • How a Self-Service Semantic Layer for Your Data Lake Saves You Money Recorded: Jun 16 2020 27 mins
    Lucio Daza, Scott Gay
    Semantic layers are business representations of an organization’s data assets that help users access and gain value from it using common business terms. While this concept is not brand new, the innovation of the cloud data lake magnifies the potential of semantic layers far beyond its original intent.

    Join our webinar to learn how to successfully implement a semantic layer on the data lake to minimize your data pipeline complexities and allow business users to interact with data without needing to know where and how it is physically stored and organized.

    Register for this session to learn:
    - Common challenges with semantic layers and how to overcome them
    - How a semantic layer reduces pipeline complexity
    - Best practices to successfully implement a semantic layer on the data lake
  • Build a Best In Class Data Lake Recorded: Jun 4 2020 36 mins
    Lucio Daza, Scott Gay
    We have identified the best practices for building a best-in-class data lake. In this webinar, we will explore, advantages of cloud data lakes; Major challenges when building a data lake on the cloud; What a modern data architecture looks like and how to best implement it; How different industries leverage these best practices in real-world scenarios
  • Apache Arrow: Delivering high performance data lake access with Hyperrest Recorded: Apr 15 2020 34 mins
    Sudheesh Katkam
    Pandas is one of the most popular data analytics frameworks for Python; it supports many data sources including relational databases as long as a compatible ODBC driver exists. However, the ODBC API is designed around a record-centric paradigm, so some processing is required to convert data received from an ODBC source to a Pandas DataFrame. As a result, ODBC access is relatively slow compared to other approaches to data access.
    Dremio uses Apache Arrow in-memory columnar storage to represent data internally. The Arrow format is very similar to a Pandas DataFrame, and the Apache Arrow project provides fast conversion functions between Arrow and Pandas. Dremio also offers both an ODBC driver for general-purpose tools and an API to return Arrow data directly to Pandas. This webinar explores how the two of them compare.
    Join this webinar to learn:
    - How Dremio uses Hyperrest, a new Apache Arrow API.
    - How to connect Dremio to Pandas using the Arrow API and the ODBC Driver.
    - Performance comparisons between ODBC and Arrow API.

    Even if you can't make it, register anyway, and we'll send you the recording.
  • Data Reflections Recorded: Apr 8 2020 59 mins
    Steven Philips
    In order to make raw data available for business users or data scientists to consume, companies often develop complex ETL pipelines in which data is copied many times between systems. These can be hard to maintain and prone to breakage.

    Dremio has developed a new approach called data reflections. When used in conjunction with a cost-based optimizer such as Apache Calcite, data reflections can help accelerate queries without the need for data engineers to manually create data copies or data consumers to interact with different materializations of data to achieve the desired performance.

    In addition, data reflections provide separation between the logical world, where analysts and data scientists need to curate and transform the model of the data, and the physical world, where data must be physically optimized in order to enable execution engines to respond to queries in real-time.

    In addition to providing an overview of data reflections and explaining the technological underpinnings, we’ll offer a live demo of Dremio, which uses reflections to automatically accelerate workloads without having to explicitly move or copy data and while affording users the freedom to curate and transform the logical model of the data.

    We’ll also talk about how to get started with Dremio, and share a few specific use cases.
  • [Replay] Top 5 Reasons Why You Are Still Asking Small Questions to Your Big Data Recorded: Apr 1 2020 59 mins
    Lucio Daza, Scott Gay
    The unprecedented level of cloud data maturity offers one of the most agile environments for analytics ever imagined. Flexible cost models, plus the ability to store, process, and analyze data directly on the cloud, are just some of the many reasons why the cloud became the go-to platform for data lakes. However, despite all these advantages, many companies are not using these resources in an optimal way.

    We surveyed hundreds of data consumers, data architects, and data executives to understand and get a clear picture of where cloud and data lake modernization is, where it is headed, and why it still represents a challenge for many. In this webinar, we will examine:

    - The current usage trends of cloud data lakes
    - The main challenges organizations face when moving to the cloud
    - How these challenges can be addressed successfully with the right technology in place
  • Simplifying Data Pipeline Recorded: Mar 18 2020 54 mins
    Lucio Daza, Mark Johnson
    Today, enterprises collect data from multiple services and devices. That data typically is moved between multiple storage systems and eventually stored in a data lake. The ultimate goal is to analyze and gain insights from this data, but it’s challenging to put together a reliable and fast data pipeline to derive value from data generated from a wide variety of sources.

    Solution Architect -Mark Johnson will show you how you can radically simplify your data pipelines. Join this webinar to learn:

    -Why data pipelines are so complex
    -How to identify issues with your data pipeline before its too late
    -How to simplify your data pipeline and reduce costs by over 40%
    -What key elements are needed for a successful data pipeline
    How to leverage data lake engines to eliminate costly pipeline complexities while maintaining full control and increasing speed and accessibility to your data

    By simplifying the data pipeline, you’ll be able to spend more time gaining insights from your data. Can’t make it? Register anyway, we’ll send you a recording.
  • Using a Data Lake Engine to Create a Scalable and Lightning Fast Data Pipeline Recorded: Mar 5 2020 50 mins
    Justin Dunham, Ryan Murray
    While on-premise data lakes co-locate compute and storage, cloud data lakes comprise separate compute and storage. As a result, cloud data lakes can be fraught with complexity, poor performance, and high query cost.

    Join us to learn how to:
    - Implement a Data Lake Engine on your cloud data storage
    - Reduce your cloud data lake query cost with a Data Lake Engine
    - Achieve lightning-fast query performance

    We will be showing real-world use cases on cloud data lakes. Can’t make it? Register anyway, and we’ll send you the recording.
  • Creating a cloud data lake for a $1 trillion organization Recorded: Feb 20 2020 59 mins
    Jeff King, Roberto Maybin
    NewWave is a trusted systems integrator for the Centers for Medicare and Medicaid Services (CMS), the largest healthcare payer in the US. Using Dremio’s Data Lake Engine & Microsoft ADLS Gen2, NewWave is modernizing and transforming CMS’ data architecture.

    Join us to hear from Microsoft, NewWave, and Dremio. We’ll cover:

    - Key challenges of legacy data analytics architectures and healthcare data
    - How a data lake engine, together with Azure Data Lake Storage, helps you overcome these challenges
    - How to support self-service data without vendor lock-in

    Join us on September 12th. Can’t make it? Register anyway, and we’ll send you the recording.
  • How Dremio and Tableau enable cloud data lake analytics at InCrowd Sports Recorded: Feb 6 2020 60 mins
    Jason Nadeau, Ciaran Fisher, Conor Knowles
    Savvy data-driven enterprises are moving from a proprietary data warehouse model to an open data lake model running in the cloud to gain control of their data and lower their costs; however, many organizations run into performance and operational challenges in the transition to this new architecture.

    This webinar examines how you can use Dremio and Tableau together to unlock the full power of business intelligence directly against cloud data lake storage, all without cubes, aggregation tables, or extracts. We’ll show you how your data architects and engineers gain flexibility and control, and your data analysts and BI users gain live, interactive self-service. We’ll also share how InCrowd Sports, a joint customer, uses Dremio and Tableau to provide insights that enable sports players, teams, and leagues to connect with fans directly, driving value for everyone.

    Join Jason Nadeau, Conor Knowles, Tableau and Ciaran Fisher, CTO of InCrowd Sports this Feb 6, 11 AM PST to learn how to:

    - Accelerate query performance for BI and Data Science workloads
    - Visualize data directly on your data lake
    - Secure and govern your data lake
    - Leverage multiple data sources and bring them to life with Tableau
    - Create a self-service semantic layer across all of your data sources

    Even if you can't make it, register anyway, and we'll send you the recording.
  • Top 5 Data Predictions Recorded: Jan 22 2020 35 mins
    Jason Nadeau
    As we concluded 2019, what can you expect in 2020 in terms of key data trends, particularly when it comes to cloud data lakes? Who will emerge as the winner in the cloud data game?

    Data usage is growing exponentially, and with this growth comes a variety of new use cases, challenges, and opportunities. Join this live webinar on January 21 at 11 am PT with Jason Nadeau, Dremio VP of Strategy, who looks to next year to predict the data technology trends that will impact businesses in 2020.

    Watch the webinar to learn:

    - What are the key data industry predictions for 2020?
    - What’s the future of cloud data lakes?
    - How can you take advantage of these data trends?

    Wherever you’re at on your data lake journey, don’t miss this chance to hear about the top data industry predictions for 2020. Can’t make it? Register anyway, and we’ll send you the recording.
  • Data-as-a-Service on the Data Lake Recorded: Apr 17 2019 48 mins
    Kelly Stirman, CMO and VP Strategy, Dremio
    Enterprises have been using data lakes to take advantage of their flexibility, cost model, elasticity, and scalability – but access to the data is a challenge for analysts and BI users.

    Join this webinar to learn how to:

    - Create a self-service semantic layer across all of your data sources, including your data lake
    - Accelerate query performance for BI and data science workloads
    - Secure, govern and mask sensitive data from any source
    - Build a zero-copy data curation model

    And do it all using open source technologies.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Build a Best In Class Data Lake
  • Live at: Jun 4 2020 6:00 pm
  • Presented by: Lucio Daza, Scott Gay
  • From:
Your email has been sent.
or close