Name: Building an Open Data Lake House Using Trino and Apache Iceberg
Start: 2023-04-18T18:00:00Z
End: 2023-04-18T18:00:34.000Z
Location: BrightTALK
Rating: 4.5

Starburst’s mission is to free our customers to see the invisible and achieve the impossible. Join us for high value content, insightful conversations, and the constant opportunity to learn. 

In today's landscape, digital transformation to provide seamless customer journeys is critical to long-term success. Retailers and consumer packaged goods (CPG) companies need to leverage data to drive insights, remain competitive, and build excellent, direct-to-consumer experiences. Analysts grapple with the sheer volume of data generated every second in modern retail, especially with E-commerce experiences expanding in the industry. With increasing pressure to innovate and scale, it can be challenging for organizations to digitally transform efficiently and effectively to meet customer demands. 

Join us to learn how retail and CPG organizations can place Starburst at the heart of their data strategy to revolutionize customer experiences and mitigate operational gaps.

In this webinar, we'll cover:

-Digital transformation revolutionizing the Retail CPG industry
-How innovators are rising to the challenge  
-How Starburst Data Lake Analytics Platform can support your organization’s customer data strategy for long-term success

Revolutionizing the Customer Journey in Retail & CPG

Watch on-demand for an engaging session filled with expert insights and live Q&A. Take the first step towards maximizing query performance with Starburst Enterprise’s managed statistics.

Maximizing query performance with managed statistics in Starburst Enterprise

According to the Sixth Annual Gartner Chief Data Officer Survey, CDOs who successfully increased data sharing led data and analytics (D&A) teams that were 1.7 times more effective at showing demonstrable, verifiable value to D&A stakeholders. 

Data products abstract away the complexity of data storage for consumers. For data engineers, however, it makes sense to take advantage of AWS tools and capabilities to optimize for speed and efficiency. Utilizing the power of Starburst Galaxy, you can operationalize an AWS data lake and manage it for the purpose of data analytics. Starburst Galaxy provides fast access and flexible data product management without adding the complexity of data movement.

Implementing a data lake house architecture with Starburst Galaxy on AWS capitalizes on the low-cost object storage of Amazon Simple Storage Service (Amazon S3) and the ability to load all types of data, while implementing the data warehousing principles of performance, reliability, and ease of use. A data lakehouse allows you to optimize your data architecture to meet specific organizational needs through the balance of cost-based optimizations and scalability, while also implementing a reporting structure to operationalize your analytics. At the same time, because Starburst can connect to and query multiple modern and legacy enterprise sources, it allows data lake users to only pay for what they use and minimize data duplication.


In this webinar, we'll cover:

-What are data products? 
-Imperatives for building great data products 
-Benefits for Data Producers and Consumers 
-Best practices for data products creation and usage on AWS
-A product demo

Extracting the Full Business Value of Data with Starburst Data Products on AWS

The move to the cloud and pay-as-you-go consumption models give IT leaders more flexibility to scale expenses upward or reduce them downward. But when you’re running an application in the cloud, that’s just part of the application’s total workload. Starburst Warp Speed sets a new benchmark in data lake analytics, empowering organizations to more quickly and efficiently derive greater insights from their data. 

In this webinar, Russell Christopher, Director of Product Strategy, and Guy Mast, Product Manager, will demonstrate a simplified environment wherein a single infrastructure significantly reduces costs and improves query response times with:

Speed query performance with smart indexing ensuring data is active and available for analysis, reducing query response times up to 7x.
Adapt to business requirements with elastic resource management that can automatically cache frequently accessed data to speed performance, and optimize for cost. 
Reduce operational costs with workload-level monitoring by detecting hot data and bottlenecks, saving customers up to 40%+ cloud compute cost reduction.

Warp speed - Setting a New Standard of Data Lake Analytics

Building a modern lakehouse has become easier with new technological advances bringing database-type functionality to the data lake. In this session, we’ll discuss the history of data lakes, the importance of the advances in open table formats and query engines in the last few years, and where we predict the future will lead us in providing self-serve analytics to enterprises of all sizes. 

Topics covered: 
How do we define a data lakehouse
How to start a data lakehouse strategy 
Implementing a data lakehouse

Building a Modern Lakehouse

As data architectures evolve there is a big question to answer: How do we best evolve people, too? Is it more pragmatic to adopt a “Modern Data Stack” which is iterative and requires less people and process change? Is it better to take a more holistic approach, which Data Mesh proposes, pushing people & technology forward at once? What is the best next step toward companies becoming data-driven?


This fireside chat, moderated by Justin Borgman, Starburst Co-Founder & CEO, will bring the stars of the data world to the table to discuss the hard and soft side of data: People and technology.

Data disrupted: A fireside chat

In a landmark research study by Boston Consulting Group, the macro trends shaping the data & analytics imperative are explored. As analytics use cases proliferate, and more data is created, how will companies illuminate dark data and make it easy to consume? Given today's economic uncertainty, how can companies solve for faster data consumption while finding a path to analytics cost control?

Join Pranay Ahlawat, Partner and Associate Director, Enterprise Software & Cloud at BCG as he presents their latest findings on how companies can derive the most value from an organization's single most important asset: Data. Pranay will be joined by Steven Huels, Senior Director, AI Product Management and Strategy at Red Hat, and Adrian Estala, Field Chief Data Officer at Starburst.

The future of data: A BCG study

What is Data Mesh? Join Starburst for an introduction into this modern approach to managing analytics at scale. This is the first installment of a series on Data Mesh. Coined by Zhamak Dehghani, Principal Technology Consultant at ThoughtWorks, Data Mesh embraces decentralization over-centralization, meaning it allows companies to become more efficient in accessing and exploiting data as a core architectural approach. Data Mesh addresses the flaws in monolithic data warehouses models. 
We’ll cover:

The foundations of Data Mesh, what it is and how it works
How to rethink organizational, architectural, and technological assumptions to get the best out of your data team and your data. 
Why Starburst and Trino are essential to your Data Mesh 
Starburst is the analytics engine for Data Mesh. If you are moving towards adopting a Data Mesh architecture we want to be there to help.

Data Mesh 101: What It Is & Why You Need It

Let’s face it. For all the incredible innovations in modern BI, we still face limitations when it comes to live connections to large enterprise sources. Starburst provides the single point of live access to all enterprise data, wherever it resides, through Trino’s ANSI-SQL MPP engine. With big data access, and used in combination with ThoughtSpot, users can query and analyze billions of rows of data across many sources at speeds thought unimaginable. Starburst and ThoughtSpot give business analysts the tools to ask new questions with easy expanded access to new data sources.

This session with Tom Nats, Director of Customer Solutions at Starburst, along with Sean Zinsmeister, VP of Product Marketing at ThoughtSpot will share how ThoughtSpot, used in conjunction with Starburst, opens new avenues by incorporating new data sources and performing high-speed queries at scale.

Bringing Big Data to Enterprise BI with Starburst & ThoughtSpot

Data architecture-as-a-service or DAaaS is a new self-service paradigm that is ideal for data meshes. It empowers local data owners to create architecturally compliant data repositories, domains, and pipelines without IT assistance. It is the culmination of self-service, where business units liberate themselves almost entirely from enterprise IT. If done right, DaaS eliminates data silos, reduces data bottlenecks, eases the burden on enterprise data teams, and empowers local domains to service their own data needs. It’s also a key ingredient in the data mesh, an emerging distributed architecture for data ownership and management. 
 
Data architecture-as-a-service is a verbal twist on cloud processing environments, such as software-as-a-service or platform-as-a-service. This moniker conveys that it’s possible to abstract architecture and build it into easy-to-use, customer-facing tools. When we abstract data architecture, we solve the most enduring data pain point in the data world: the proliferation of data silos and pipelines that wreak havoc on data consistency and trustworthiness.
 
You will learn:  
• What DAaaS is
• Why DAaaS is critical for governed self-service
• How DAaaS prevents data silos and empowers data domains

About the speaker:
Wayne Eckerson is an international thought leader in data and analytics who thinks critically, writes clearly, and presents persuasively about complex topics. He is a best-selling author, sought-after consultant, and noted speaker. Eckerson has advised a range of companies about how to implement data and analytics programs, architectures, and infrastructure, including Walmart, New Balance, and Children’s Hospital of Philadelphia. Eckerson is President of Eckerson Group, a consulting and research firm that helps organizations get more value from their data. He has degrees from Williams College and Wesleyan University.

Unlocking the value of data mesh with Data Architecture as a Service

Data architecture provides a blueprint for how an organizations can collect, store, manage, and utilize their data to meet business objectives. It defines the rules, standards, and guidelines for managing data as a valuable organizational asset and supports the integration of data from disparate sources into a coherent and usable whole. 

Anti-patterns of data architectures refer to common mistakes and pitfalls that data architects may encounter while designing and implementing data solutions. These anti-patterns can lead to suboptimal performance, data inconsistencies, and high maintenance costs. For example, over-normalization, complex data models among other specific characteristics. 

This webinar from data industry thought leader Dr. Pragyansmita Nayak will identify and analyze the most prevalent anti-patterns in data architectures, along with their root causes and potential solutions. By understanding these anti-patterns, data architects can systematically avoid them and design more effective and efficient data solutions while continuing to meet the needs of their organizations.

Speaker bio:
Dr. Pragyansmita Nayak is the Chief Data Scientist at Hitachi Vantara Federal (HVF). She explores the "Art to the Science" of solution architectures orchestrating data, APIs, algorithms and applications. She has over 24+ years of experience in software development and data science (Analytics, Machine Learning and Deep Learning). She has led projects for several Federal Government agencies (DoD/Civilian) in the domain of Federal Accounting, Operational Analytics, Data Fabric, Object Storage, Metadata management, Records Management and Data Governance.  She holds a Ph.D. in Computational Sciences and Informatics from GMU (Fairfax, VA) and Bachelors of Science in Computer Science. For more information on Pragyan's professional experience, please visit her LinkedIn profile at https://www.linkedin.com/in/pragyansmita and Twitter profile at https://twitter.com/SorishaPragyan.

Anti-patterns of data architectures

Many businesses are looking to gain more impactful insights out of their data. Snowflake is a modern data analytics platform many businesses use to help gain those insights. However, going from the starting point of knowing what you want your data to provide to actually extracting valuable business insights is challenging. Whether you’re trying to figure out how to collect all of your disparate data from various sources or you’re struggling to analyze your data effectively - it’s hard work. It takes a ton of time, resources, and expertise to create and build the infrastructure required to collect the data and analyze it effectively.

Luckily, Growth Acceleration Partners (GAP) can help overcome those challenges. GAP has a team with the knowledge, expertise, and experience to help your business get what you want out of your data using Snowflake and Microsoft Azure. Learn how GAP can help any company accelerate their timeline and get to valuable data insights faster.

What you’ll get out of this webinar:
- A detailed understanding of what it takes to build the infrastructure required for a modern data analytics platform with Snowflake and Azure
- A better understanding of why Snowflake is the right choice for your data platform needs
- An overview of how the GAPAccelerator helps you get value out of your data faster
- Technical details of how the infrastructure is built and what goes into
- Description of how the experts at GAP can work with your team to provide continuous data analysis after implementation

Accelerating Time to Value with Snowflake

Data management methods are evolving quickly, as enterprises invest in gaining data agility to accelerate insightful decision-making. Modern data management is being driven by an accelerated shift to data in the cloud and the subsequent innovation in data technologies and advanced analytics. Enterprises are recognizing new opportunities to derive value from their data and to save time and money, and they’re taking a fresh look at new data management approaches to reap the rewards of smarter and accelerated decision-making. 

In response to this demand for more modern data treatments, a confusing array of architectures, technologies, and approaches has sprung up – and it’s not easy to tell which ones will truly deliver better business outcomes and which ones are just hype. Should you invest in graph technology and metadata management? What exactly is a data fabric? How can you leverage the power of AI and machine learning? 

This session will take a look at trends in data management that are worth investigating, and explain how a modern data platform can help you implement them in a way that delivers business value for your enterprise.

Data Platform Capabilities for Modern Data Management Architectures

When organizations build new applications and services, they often need scalable and lightning-fast data management platforms to support their innovations. What does it take to support new, innovative applications? Register for this tech talk to hear from Fern Halper, VP of Research at TDWI Research, who specializes in advanced analytics, along with representatives from Redis and Ekata, a Mastercard company, as they discuss one such use case: supporting Ekata’s smart identity verification solution to detect fraud. 

Presented by: 
• Henry Tam, Principal Solutions Marketing Manager | Redis
• Fern Halper, Ph.D., VP Research | TDWI
• Milena Babayev, Product Marketing Manager | Ekata
• Jason Frazier, Software Engineering Manager |Ekata

Real-time, Scalable Applications Powered by a Modern Data Platform

In today’s rapidly evolving business landscape data plays a critical role. Modern data architecture provides the tools and practices to harness the power of data and turn it into strategic advantage.  Whether it is data lakehouse, data fabric, data mesh or cloud data governance modern data architecture is enabling business leaders to make better decisions, quicker.
 
In this talk, Rajeev Pai, Director of Technology Strategy & Transformation at Deloitte, will explore what does modern data architecture entail and how it can be used to unlock business value. 
 
Key takeaways:  
- The need for newer ways for bringing, processing and distributing data in the enterprise. 
- Leading patterns and practices organizations are adopting in this journey with examples.
- Challenges organizations may face when adopting some of these new approaches.
- And how to overcome those challenges.

Whether you are a business leader looking to drive growth and innovation or a data professional seeking to help your business this talk will provide valuable insights from frontline and research.

About the speaker:
Rajeev is a leader in the space of Technology Consulting & Transformation with over 20 years of experience across the globe in the financial services industry, helping business leaders navigate change. He combines deep domain expertise in Capital Markets F2B Processes and Banking to provide thought leadership and tailored advise while challenging business leaders to achieve their vision. He is experienced in large scale System Integration, Platform Re-engineering, Enterprise Architecture, Enterprise Data Management (Data Strategy, Architecture and Management), Analytics & Data Visualisation, Operating Model constructs, Design Thinking, and has strong familiarity with Emerging Tech. He is a certified in Business Sustainability Management from the 
Cambridge Institute of Sustainability Leadership(CISL).

Data-driven success: How modern data architecture unleashes business value

Data lineage and knowledge graphs are data management capabilities that many companies would like to or have implemented. These capabilities assist in getting value from data and sustaining competitive advantage. However, several challenges with the implementation of these capabilities exist:
- Data management professionals have different views on these capabilities. Their definitions are not aligned within the data management community. These capabilities have a lot in common.
- The implementation of these capabilities is time- and resource-consuming. The correctly defined scope is one of the implementation success factors.
- Many companies still document data lineage and knowledge graphs manually in Excel.
- Plenty of various automated solutions exist. However, finding a proper solution is problematic as providers don’t have an aligned terminology to describe functionality. Often, quite differently labeled solutions deliver similar functionality.

In this session, we will discuss how to solve challenges and:
1. Demonstrate a metamodel of data lineage and knowledge graphs.
2. Show differences and similarities of these capabilities in terms of business drivers, architecture, and use cases.
3. Provide an overview and comparison of various data lineage and knowledge graphs IT tools.

About the speaker
Dr. Irina Steenbeek is a data management practitioner with more than 12 years of experience. The key areas of her professional expertise are the data management maturity assessment, implementation of data management frameworks, and data lineage. Irina has practical experience in software implementation such as ERP and DWH/BI, management consultation, financial and business controls, and data science.

Social links:
1. https://datacrossroads.nl/
2. www.linkedin.com/in/irina-steenbeek

Data Lineage & Knowledge Graphs: Similarities and Differences

As businesses strive to make data-driven decisions, choosing the right data platform has become a critical component in today's data-driven world. However, with a plethora of options available, it can be overwhelming to identify the best fit for your business. This webinar offers a comprehensive guide to the most popular data platforms, helping businesses evaluate critical factors such as data processing speed, scalability, cost-effectiveness, security, and integration capabilities to determine the ideal platform. 

Participants will gain insight into the advantages and limitations of both cloud-based and on-premise data platforms, and how different modern data platforms can support various business needs and objectives, such as data analytics, machine learning, and big data processing. By the end of this webinar, participants will have a clear understanding of which modern data platform is the best fit for their business context and objectives.

Key Takeaways:
1. Understand the strengths and weaknesses of the most popular data platforms available in the market today.
2. Learn how to evaluate crucial factors, such as data processing speed, scalability, cost-effectiveness, security, and integration capabilities while selecting a data platform for your business.
3. Recognize how different data platforms can support diverse business needs and objectives, including data analytics, machine learning, and big data processing.
4. Identify the advantages and challenges of cloud-based and on-premise data platforms and guidance on how to choose between them.

About the speaker:
Gopinath Manimayan is lead architect of digital experience at Moonraft Innovation Labs, he has over 15+ years of experience crafting exemplary technology solutions. You can always count on him for guidance with the best architecture to suit any need, or for learning about the latest technologies in the market.

Choosing the right data platform for your business

The problem with data warehousing and lakehousing is that they don’t go far enough. They don’t attack the root input problem: dumb, increasingly siloed and duplicative data and stranded, duplicated logic. It’s the proverbial garbage in, garbage out scenario, unless companies allocate their entire innovation budgets to integration.
 
More and more apps create the need for more and more reinvent-the-wheel integration, because each app has its own repository and its own data model. This is one reason why big banks have more than 10,000 databases each.
 
More and more apps with their own repositories is one reason we could be in the Yottabyte Era by 2030. If the world has to store two yottabytes of data per year, 40% of the economy could be dedicated to just storing data–90% of which is duplicated data that’s hard to reuse. 
 
For their part, applications duplicate the core description and predicate logic that should live with the data, where it can be shared and reused repeatedly within a knowledge graph.
 
A better, data-centric approach puts data and data models first. A FAIR data approach -- smarter data that’s designed to be findable, accessible, interoperable, and reusable–addresses the integration problem up front. No more garbage in, so no more garbage out. Data becomes self-describing with the help of the logic that used to be trapped in applications. 
 
This talk will examine how organizations in various industries use semantic knowledge graphs to solve their data integration and analytics problems at a fraction of the cost of data warehousing or lakehousing. It’s an organic approach to data and logic that can eliminate growing amounts of waste and complexity. Companies can run these systems at low cost in parallel with their legacy environments until they commit to graph data model-driven development, which is when even more substantial benefits will accrue.

FAIR data: Superior data visibility and reuse without warehousing

Embedded analytics has become a vital aspect of business intelligence, enabling organizations to gain insights from their data at the point of decision, in their operational applications. 

This webinar will explore the architecture and experience of embedded analytics, showcasing how it can be effectively implemented to drive informed decision-making and enhance user experiences. Join us to learn about the various components, best practices, and real-world applications of embedded analytics in modern data architectures. With this webinar, you will:

- Understand the role of embedded analytics in the context of decision intelligence.
- Gain insights into the architecture of embedded analytics, including its components, scalability, performance and security.
- Explore best practices for designing effective embedded experiences.
- Discover real-world applications and use cases of embedded analytics across various industries.
- Stay informed on future trends and opportunities in the embedded analytics space including natural language.

Embedded Analytics: Architecture and Experience

Data Architecture Best Practices

As companies build their data analytics practice, they quickly outgrow running analytics off their operational store that powers their applications. Building a read replica only buys them time until they hit scalability limits with their growing internal and customer demand. This is where one hits the crossroads of going all in with a cloud data warehouse or choosing an open data lake house approach to future-proof them for scale, performance, and cost efficiency. In this workshop, Matt Fuller and Tom Nats lead you through how you can easily build and manage an open data lake house architecture using open-source technologies such as Trino and Apache Iceberg to support your growing analytics. Trino is an open source highly parallel and distributed query engine built from the ground up at Facebook for efficient, low-latency analytics. Iceberg is an open source, high performant table storage format that enables an engine like Trino to perform data warehousing SQL functionality such as UPDATE, DELETE, and MERGE commands on the data lake house. In addition, Matt and Tom will lead you through combing these technologies to perform near real-time analytics with streaming ingestion with database functionality on the lakehouse. This workshop will use the Starburst Galaxy SaaS product making it simple to leverage these technologies for your modern data lake house without having to worry about the operational aspects of running Trino and other software.

Building an Open Data Lake House Using Trino and Apache Iceberg

Data Analysis

Data Lake

Open Source

Analytics

SQL server

Data Analytics

Cloud Data

Data Best Practices

Practicing business intelligence allows your company to transform raw data into sets of insights for targeted business growth. The business intelligence and analytics community on BrightTALK is made up of thousands of data scientists, database administrators, business analysts and other data professionals. Find relevant webinars and videos on business analytics, business intelligence, data analysis and more presented by recognized thought leaders. Join the conversation by participating in live webinars and round table discussions.

Business Intelligence and Analytics

As an IT professional, many of the problems you face are multifaceted, complex and don’t lend themselves to simple solutions. The information technology community features useful and free information technology resources. Join to browse thousands of videos and webinars on ITIL best practices, IT security strategy and more presented by leading CTOs, CIOs and other technology experts.

Building an Open Data Lake House Using Trino and Apache Iceberg

Presented by

About this talk

Starburst Data