Name: Optimizing Data Files in Apache Iceberg: Performance strategies
Start: 2023-06-13T15:00:00Z
End: 2023-06-13T15:00:42.000Z
Location: BrightTALK
Rating: 5

Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Dremio increases agility with a revolutionary data-as-code approach that enables Git-like data experimentation, version control, and governance.

Organizations aim to increase data access and lower the time it takes to gain insights, all while managing governance and controlling rising data costs.

Dremio’s unified lakehouse platform for self-service analytics enables data consumers to move fast while also reducing manual repetitive tasks and ticket overload for data engineers.

In this Gnarly Data Waves episode, you will learn: 
- Overview of Dremio, what is it and why is it growing rapidly
- Proven use cases by some of the most demanding customers in the world
- Demonstration for how to rapidly get started and try it out

#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #dremiocloud #opendatalakehouse #apacheiceberg #selfservice #enterprisedata #multitables #tableformat #automateddata #query #etl #pipelines #genai #generativeai #parquet #json #tableau #bi #shiftleft #usecases #tco #datamangement #views

Getting Started with Dremio

Traditional ETL processes are notorious for their complexity and cost inefficiencies. Watch this video as we introduce a game-changing virtual data pipeline approach with Dremio's next-gen DataOps, aimed at streamlining, simplifying, and fortifying your data pipelines to save time and reduce cost.

You'll gain insights in this video:
- Simplified Data Pipeline Management: How to use Dremio for data source branching, merging, and pipeline automation.
- Mastering Data Ingestion and Access: Learn how to curate data using virtual data marts accessed through a universal Semantic layer.
- Better Orchestration with dbt: Discover the benefits of orchestrating DML and view logic, optimizing data workflows.
- Elevating Data Quality: Learn techniques to automate lakehouse maintenance and improve  data integrity.

Next-Gen Data Pipelines are Virtual

S&P Global is a leading global financial services company headquartered in New York. It provides credit ratings, benchmarks, analytics, and workflow solutions in the global capital, commodity, and automotive markets. As a company, data is an essential asset across all of S&P Global’s solutions offerings. 

Watch Tian de Klerk, Director of Business Intelligence, as he shares how they built a data lakehouse for FinOps analysis with Dremio Cloud on Microsoft Azure.

In this video, you will learn about:
- The hidden costs of extracting operational data into BI cubes
- Simplifying traditional data engineering processes with Dremio’s zero-ETL lakehouse
- How Dremio’s semantic layer and query acceleration make self-service analytics easy for end users

How S&P Global is Building an Azure Data Lakehouse with Dremio

Companies are struggling with the complex, brittle, and expensive nature of the data lifecycle in existing analytical environments. Dremio is announcing the availability of Dremio Cloud on Microsoft Azure, providing companies the ability to simplify and optimize their analytical environment. 

Watch and learn how Jonny Dixon, Sr. Product Manager at Dremio and Hanno Borns, Principal Product Manager at Microsoft Azure will look into:
- Problems companies face with existing analytical architectures
- How  Dremio and Microsoft Azure work together
- What  Dremio Cloud on Azure is, and the value it provides
- How the Dremo Cloud on Azure solution works, with a demo

Empowering Analytics: Unleashing the Power of Dremio Cloud on Microsoft Azure

Data integration is the foundation of modern business. When organizations consolidate the ingestion, cleansing, and transformation of disparate data sources into high-performance pipelines, they can drive analytics insights into every decision.

In order to keep pace with fast-changing business requirements, enterprises must keep pace with best practices in data integration. Chief among these is migrating data integration pipelines to scalable, resilient, and agile cloud platforms.

In this panel discussion video, TDWI senior research director James Kobielus will engage data industry experts in an in-depth discussion of data integration trends and best practices. 

The discussion will focus on several key issues:
- What are the hallmarks of modern data integration?
- What trends are spurring enterprises to modernize their data integration capabilities?
- Why should enterprises modernize their data integration platforms, processes, and organizations?
- What core capabilities are essential for deploying a full-featured enterprise data integration stack?
- What emerging data integration best practices are needed to support sophisticated new use cases in artificial intelligence, distributed analytics, and low-latency streaming?
- What new techniques should enterprises consider for reducing the cost and improving the efficiency of their data integration processes?
- How feasible is it for enterprises to entirely automate their data integration processes?
- To what extent can and should self-service tools be used to help business analysts and other nontraditional roles build and deploy sophisticated data integration pipeline logic?
- What the essential first step for enterprises on their journeys to modern data integration?

Watch this video and learn the data integration trends and best practices for your organization.

Expert Panel Discussion – Data Integration Trends and Best Practices Webinar

Watch this live fireside chat with David Stodder, Senior Director of Research Business Intelligence at TWDI, and Nik Acheson, Senior Product and Strategy Leader at Dremio, as they talk about using Data Mesh to Advance Distributed Data Access, Agility and Governance. During this informative session, you will learn:

- Best practices for success in the data mesh journey so you can make it easier to discover, understand, and trust data
- The importance of metadata catalogs, business glossaries, and data intelligence for integrating discovery, access, and governance
- How data mesh, data fabrics, and data virtualization differ and are related
- The role of an open data lakehouse in a distributed data architecture
- Balancing self-service data domains with requirements for enterprise data governance
- Sorting out data virtualization, data mesh and data fabrics
- Role of metadata catalogs, business glossaries and semantic layer
- Data mesh and the open data lakehouse: How they fit together
- The data mesh journey: Lessons learned and best practices
- Improving the user experience and increasing business value

Using Data Mesh to Advance Distributed Data Access, Agility and Governance

Supply Chain of the Future: Deep dive with Mark Sear from Maersk and hear how he is delivering the supply chain of the future for one of the largest supply chains in the world

Data as a Force Multiplier

Watch Mike Fergusion, CEO at Intelligent Business Strategies discuss top data trends and outlook onward.

Top Data Trends with Mike Ferguson, CEO at Intelligent Business Strategies

Watch Sendur Sellakumar, CEO of Dremio showcase The State of the Lakehouse: Benchmark your organization by learning about the recent data and AI trends from a new survey of over 500 organizations.

The State of the Lakehouse with Sendur Sellakumar, CEO of Dremio

Watch Sendur Sellakumar, CEO at Dremio, Mike Ferguson, CEO at Intelligent Business Strategies and Mark Sear, Director of Data Analytics and AI/ML at Maersk discuss 2024 trends and predictions: Industry experts provide valuable insights, shaping the year ahead with their knowledge and experience.

2024 Predictions Panel discussion with Industry experts

Dremio delivers no compromise lakehouse analytics for all of your data - and recent launches are making Dremio faster, more reliable, and more flexible than ever. 

Watch Mark Shainman, Product Marketing at Dremio and Colleen Quinn, Product Marketing at Dremio provides what’s new in Dremio:
- New Gen-AI capabilities for automated data descriptions and labeling
- Dremio Cloud SaaS service now available on Microsoft Azure
- Advances to ensure 100% query reliability with no memory failures
- Expanded Apache Iceberg capabilities to streamline Iceberg adoption and improve performance

What’s new in Dremio : New GenAI capabilities, 100% query success + on Azure

Embark on a transformative journey with our insightful presentation, "ZeroETL & Virtual Data Marts: The Cutting Edge of Lakehouse Architecture." In this engaging session, we'll delve into the intricacies of modern data engineering and how it has evolved to address key pain points in the realm of data processing.

Alex Merced, Developer Advoate will illuminate the challenges data engineers face, from the complexities of backfilling and brittle pipelines to the frustration of sluggish data delivery. We'll introduce you to the high-impact concepts of ZeroETL and Virtual Data Marts, demonstrating how these innovative patterns can dramatically alleviate these common pains. By reducing the need for manual data movement and preparation pipelines, you'll discover a more efficient, agile, and responsive data ecosystem.

Watch this video for a practical guide to implementing these transformative patterns. We'll walk you through the steps to bring the power of ZeroETL and Virtual Data Marts into your own data landscape. Leveraging cutting-edge tools like Dremio, DBT, and more, you'll gain hands-on experience in designing and deploying these patterns to streamline your data workflows and supercharge your analytics capabilities.

Don't miss this opportunity to stay at the forefront of data architecture, enabling your organization to harness data's full potential while reducing complexity and overhead. The exploration of the future of data engineering – a future where ZeroETL and Virtual Data Marts pave the way for data agility, speed, and innovation.

ZeroETL & Virtual Data Marts: The Cutting Edge of Lakehouse Architecture

In this engaging talk with Alex Merced, Developer Advocate, we'll explore how Dremio revolutionizes data access, delivering speed, simplicity, and substantial cost savings. 

Discover the power of Dremio as we dive deep into:
- Data Access at Lightning Speed: Learn how Dremio accelerates data access, making insights available in real-time.
- Simplicity in Data Preparation: Streamline your data pipeline with Dremio's intuitive interface for data transformation.
- Cost Efficiency: Uncover how Dremio’s optimizations save you money while improving performance
- Use Cases: Explore real-world success stories and applications of Dremio's data access solutions.
- Future-Proofing Your Data Infrastructure: Understand how Dremio ensures scalability and adaptability.

Watch this video to uncover the secrets of fast, easy data access without breaking the bank!

How Dremio provides you fast and easy data access while saving you money

Organizations are struggling with the proliferation of toolings in their data infrastructure and the exponential growth of ETL pipelines are slowing down data engineers to deliver value to the business. They want to spend more time making impactful decisions and working on high value projects. Fivetran significantly reduces the amount of time spent in building ETL pipelines with their no-code approach. Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all your data. Together, Dremio and Fivetran bring the best solution for enabling organizations to GTM faster.

In this video, you will learn:
- What Iceberg table format is and why it matters in data lakehouses
- How to load source files into Iceberg tables using Fivetran 
- How to create a unified access layer for your data with Dremio Cloud

How To Build an Iceberg Data Lakehouse with Fivetran and Dremio

Watch this insightful webinar featuring Jacopo Tagliabue of Bauplan Labs as he dives into the world of data science and machine learning pipelines. In this video, you'll discover the rationale behind Bauplan Labs' choice of open-source technologies, such as Apache Iceberg table format and Project Nessie transactional data catalog, for their cutting-edge platform. Gain valuable insights into why modern data platforms are increasingly adopting these technologies and how Nessie's git-like features can revolutionize your data management. Don't miss out on this opportunity to stay ahead in the world of data science and technology!

About Project Nessie - https://www.dremio.com/blog/introducing-nessie-as-a-dremio-source/

What you will learn:
- Why Modern Data Platforms are being built on Apache Iceberg
- Why Modern Data Platforms are being built on Nessie

Building a Data Science Platform on Apache Iceberg and Nessie

Product analytics offers a transformative opportunity for companies to elevate the customer experience and offer a way to proactively understand customer behavior. This personalized understanding allows companies to tailor their product offerings, provide targeted recommendations, and streamline customer journeys, resulting in a more engaging, satisfying, and loyalty-inducing customer experience. Effective product analytics is a comprehensive strategy to proactively manage support and promote customer success.

NetApp, a leading global company specializing in hybrid cloud data services, helps enterprises build a simple and secure way to drive innovation wherever their data and applications live. The customer experience is a core driver within NetApp’s portfolio of solutions offering.

Watch Aaron Sims, Technical Director at NetApp as he shares his experience building out a unified access layer for product analytics with Dremio.

In this episode, you will learn:
- NetApp’s journey to unified analytics with Dremio’s phased approach for Hadoop modernization
- How a unified access layer makes data easier to discover and explore for your end users without data duplication
- Ways to maximize your existing infrastructure investments for improved ROI and lower TCO with Dremio.

How NetApp is Redefining the Customer Experience with Product Analytics

Watch this video to learn about the common challenges of using traditional file formats on premises and how leveraging Apache Iceberg on AWS helps you overcome these challenges. You will also learn about the comprehensive and advanced features of Apache Iceberg with elaborate demos that showcase the unique capabilities of Apache Iceberg on AWS.

Dive deep of Apache Iceberg on AWS

Querying 100s of petabytes of data demands optimized query speed specifically when data accumulates over time. We have to ensure that the queries remain efficient because over time you may end up with a lot of small files and your data might not be optimally organized.

In this talk, we will cover:
- Apache Iceberg table format
- Problems in the data lake: small files, unorganized files
- Techniques such as: partitioning, compaction, metrics filtering
- Overlapping metrics problem
- Solving it using sorting, Z-order clustering

Optimizing Data Files in Apache Iceberg: Performance strategies

Data Lakehouse

Apache Iceberg

Meta Data

Z Order

Clustering

Partitioning

Metrics

Data Analytics

Data Architecture

Data Governance

Welcome to the virtualization community on BrightTALK! Whether it affects servers, storage, networks, desktops or other parts of the data center, virtualization provides real benefits by reducing the resources needed for your
infrastructure and creating software-defined data center components. However, it can also complicate your infrastructure. Join this active community to learn best
practices for avoiding virtual machine sprawl and other common virtualization pitfalls as well as how you can make the most of your virtualization environment.

Virtualization

Are you an IT service management professional interested in developing your knowledge and improving your job performance? Join the IT service management community to access the latest updates from industry experts. Learn and share insights related to IT service management (ITSM) including topics such as the service desk, service catalog, problem and incident management, ITIL v4 and more. Engage with industry experts on current best practices and participate in active discussions that address the needs and challenges of the ITSM community.

IT Service Management

Cloud computing has exploded over the past few years, delivering a previously unimagined level of workplace mobility and flexibility. The cloud computing community on BrightTALK is made up of thousands of engaged professionals learning from the latest cloud computing research and resources. Join the community to expand your cloud computing knowledge and have your questions answered in live sessions with industry experts and vendor representatives.

Cloud Computing

Increasing expectations for good governance, effective risk management and complex demands for corporate compliance are presenting a growing challenge for organizations of all sizes. Join industry thought leaders as they provide you with practical advice on how to implement successful risk and compliance management strategies across your organization. Browse risk management resources in the form of interactive webinars and videos and ask questions of expert GRC professionals.

IT Governance, Risk and Compliance

The data center management community focuses on the holistic management and optimization of the data center. From technologies such as virtualization and cloud computing to data center design, colocation, energy efficiency and monitoring, the BrightTALK data center management community provides the most up-to-date and engaging content from industry experts to better your infrastructure and operations. Engage with a community of your peers and industry experts by asking questions, rating presentations and participating in polls during webinars, all while you gain insight that will help you transform your infrastructure into a next generation data center.

Data Center Management

The application development community features top thought leadership focusing on optimal practices in software development, SDLC methodology, mobile app development and application development platforms and tools. Join top software engineers and coders as they cover emerging trends in everything from enterprise app development to developing for mobile platforms such as Android and iOS.

Application Development

Practicing business intelligence allows your company to transform raw data into sets of insights for targeted business growth. The business intelligence and analytics community on BrightTALK is made up of thousands of data scientists, database administrators, business analysts and other data professionals. Find relevant webinars and videos on business analytics, business intelligence, data analysis and more presented by recognized thought leaders. Join the conversation by participating in live webinars and round table discussions.

Business Intelligence and Analytics

The IT project management community on BrightTALK includes thousands of IT project and portfolio management professionals. Find relevant webinars and videos on agile methodologies, scrum strategy, project management processes and more. Attend live webinars or view on demand content presented by recognized thought leaders in the IT project management industry.

IT Project Management

Network infrastructure professionals understand that a reliable and secure infrastructure is crucial to enabling business execution. Join the network infrastructure community to interact with thousands of IT professionals. Browse hundreds of on-demand and live webinars and videos to learn about the latest trends in network computing, SDN, WAN optimization and more.

Network Infrastructure

Welcome to the big data and data management community on BrightTALK. Join thousands of data quality engineers, data scientists, database administrators and other professionals to find more information about the hottest topics affecting your data. Subscribe now to learn about efficiently storing, optimizing a complex infrastructure, developing governing policies, ensuring data quality and analyzing data to make better informed decisions. Join the conversation by watching live and on-demand webinars and take the opportunity to interact with top experts and thought leaders in the field.

Big Data and Data Management

As an IT professional, many of the problems you face are multifaceted, complex and don’t lend themselves to simple solutions. The information technology community features useful and free information technology resources. Join to browse thousands of videos and webinars on ITIL best practices, IT security strategy and more presented by leading CTOs, CIOs and other technology experts.

Optimizing Data Files in Apache Iceberg: Performance strategies

Presented by

About this talk

More from this channel