Hi [[ session.user.profile.firstName ]]

Best of 2019 - Leveraging Streaming and Batch Data Sets for ML Applications

Data Engineering is fast emerging as the most critical function in Analytics and Machine Learning programs. The ability to build and manage data pipelines for streaming and batch data sets are critical for the downstream success of your ML applications.

In this webinar, you will learn how to use Qubole’s cloud-native platform to acquire and transform data sets for data science and analytics, make data sets available to different users, and fully leverage your data lake throughout your organization. Our experts will also walk through a real-world example of how to use Apache Spark and Airflow, as well as Notebooks, to build an end-to-end solution.

Attendees will learn how to:

+ Ingest data to/from a cloud storage data lake
+ Perform interactive data analysis and build AI/ML models
+ Transform data sets with Spark and build interactive dashboards
+ Seamlessly interact with other data sources
+ Deploy end-to-end data pipeline using Apache Airflow
Recorded Jan 7 2020 31 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Jorge Villamariona and Ojas Mulay from Qubole
Presentation preview: Best of 2019 - Leveraging Streaming and Batch Data Sets for ML Applications

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • The Open Data Lake Talks: Optimizing Costs in A Changing World Recorded: Apr 22 2020 60 mins
    Justin Wainwright, Systems Analyst & Rajit Saha, Director, Data Analytics
    As organizations grapple with the sudden economic turmoil created by the pandemic, there is a critical need to balance cost savings with the need to drive innovation.

    We’ve assembled a panel of thought leaders in analytics and machine learning to understand how their organizations are optimizing cloud costs associated with Data Analytics and Machine Learning during the COVID-19 pandemic.

    This one-hour webcast panel and Q&A will explore the greater need for visibility and real-time insights into data workloads in an increasingly virtualized world. Our panelists will dive into how you can optimize TCO for the short- and long-term and introduce cloud cost-saving efficiencies that stay.

    We want you to walk away from this community discussion with an arsenal of tactics to implement in your organization, including:

    + Top tips for optimizing big data processing costs in the cloud
    + Tactics to implement in order to identify and eliminate wasteful spend
    + Best practices for achieving TCO in an open data lake

    Moderator
    Mohit Bhatnagar, SVP Product Management, Qubole

    Panelists
    Justin Wainwright, Systems Analyst
    Rajit Saha, Director, Data Analytics
  • Leveraging Qubole Cost Explorer for TCO Optimization Recorded: Apr 20 2020 31 mins
    Dhiraj Seghal, Director Product Marketing at Qubole, and Sandeep Dabade, Sr. Solution Architect at Qubole
    As enterprises are today becoming more data-driven, they need to develop their financial governance muscle for data lakes. More compute usage to get insights and analysis leads to cost overruns and unpredictability for the future.

    This live demo introduces you Qubole Cost Explorer which gives Qubole users the ability to monitor, manage, and optimize big data costs by providing granular visibility of workloads at the job, cluster, and cluster instance levels. With Cost Explorer, you can track spending, monitor showback, justify business plans, plan and budget, and build ROI analyses.

    In this webinar you will learn the following:

    + How Qubole users get autonomous and policy-based cost optimization without sacrificing service level agreements (SLAs)

    + The level of monitoring capabilities that provide granular costs at the workload, application, user, and account level in addition to the cluster, cluster-instance costs
  • Qubole On-Demand: Discover our Open Data Lake Platform Recorded: Mar 20 2020 36 mins
    Minesh Patel, VP Solution Architecture, Qubole & Dhiraj Seghal, Director Product Marketing, Qubole
    Join our on-demand demo to learn how the most data-driven organizations are able to significantly enhance TCO, performance and optimization of their cloud data lakes with Qubole.

    This session is for data engineers, analysts, scientists and data team leaders. Learn more about the common use cases that are solved with Qubole and see how we help data-driven teams save $230m+ annually in cloud costs, increase productivity, time to value, and collaboration across their cloud data lakes.

    Learn how to:

    + Drive down the costs of your big data workloads with Qubole
    + Utilize Qubole for your data engineering, analytics and data science use cases
    + Increase efficiency by leveraging cloud technologies that match your workflows
    + Extract more value with Machine Learning and AI
    + Optimize Apache Spark, Presto, Hadoop/Hive, and Airflow usage
    + Use Juypter and/or Zeppelin Notebooks with Qubole
  • Inside the Brain of Cloud Data Platform Leaders - Ep. 1: Addressing GDPR & CCPA Recorded: Mar 13 2020 59 mins
    Drew Daniels, CISO, Qubole & Akil Murali, Director, PM Data Governance & Security, Qubole
    Welcome to the first episode of Inside the Brain of Cloud Data Platform Leaders (ITB). A webinar series where we bring to you, an interactive discussion between Open Data Lake Platform Leaders, to address topics impacting the community right now.

    ABOUT:
    In this episode, we’re evaluating the SaaS/PaaS data platform from a CCPA and GDPR perspective. We’ll hear from subject experts:

    - Drew Daniels, CISO, Qubole
    - Akil Murali, Director, PM Data Governance & Security, Qubole

    DISCUSSION POINTS:
    - Track data usage and consent across many disparate systems in a data lake;
    - Comply with de-identification requirements for personal information;
    - What it means to have full visibility and control over data platform endpoints and why it has to be the new norm for governance, risk, and compliance
  • Inside the Brain of Cloud Data Platform Leaders - Ep. 1: Addressing GDPR & CCPA Recorded: Jan 29 2020 60 mins
    Drew Daniels, CISO, Qubole & Akil Murali, Director, PM Data Governance & Security, Qubole
    Welcome to the first episode of Inside the Brain of Cloud Data Platform Leaders (ITB). A webinar series where we bring to you, an interactive discussion between Data Lake Platform Industry Leaders, to address topics impacting the community right now. Join this series, so you can hear the opinions and perspectives of industry experts, and address your questions directly to them.

    ABOUT:
    In this episode, we’re evaluating the SaaS/PaaS data platform from a CCPA and GDPR perspective. We’ll hear from subject experts:
    - Drew Daniels, CISO, Qubole
    - Akil Murali, Director, PM Data Governance & Security, Qubole

    DISCUSSION POINTS:
    - Track data usage and consent across many disparate systems in a data lake;
    - Comply with de-identification requirements for personal information;
    - What it means to have full visibility and control over data platform endpoints and why it has to be the new norm for governance, risk, and compliance

    BUT, be prepared to bring in your questions, as this series is live and interactive with our audience members.
  • Replay: Succeeding with Big Data Analytics and Machine Learning in The Cloud Recorded: Jan 22 2020 48 mins
    James E. Curtis Senior Analyst, Data Platforms & Analytics, 451 Research
    The cloud has the potential to deliver on the promise of big data processing for machine learning and analytics to help organizations become more data-driven, however, it presents its own set of challenges.

    This webinar covers best practices in areas such as.

    - Using automation in the cloud to derive more value from big data by delivering self-service access to data lakes for machine learning and analytics
    - Enabling collaboration among data engineers, data scientists, and analysts for end-to-end data processing
    - Implementing financial governance to ensure a sustainable program
    - Managing security and compliance
    - Realizing business value through more users and use cases

    In addition, this webinar provides an overview of Qubole’s cloud-native data platform’s capabilities in areas described above.

    About Our Speaker:

    James Curtis is a Senior Analyst for the Data, AI & Analytics Channel at 451 Research. He has had experience covering the BI reporting and analytics sector and currently covers Hadoop, NoSQL and related analytic and operational database technologies.

    James has over 20 years' experience in the IT and technology industry, serving in a number of senior roles in marketing and communications, touching a broad range of technologies. At iQor, he served as a VP for an upstart analytics group, overseeing marketing for custom, advanced analytic solutions. He also worked at Netezza and later at IBM, where he was a senior product marketing manager with responsibility for Hadoop and big data products. In addition, James has worked at Hewlett-Packard managing global programs and as a case editor at Harvard Business School.

    James holds a bachelor's degree in English from Utah State University, a master's degree in writing from Northeastern University in Boston, and an MBA from Texas A&M University.
  • Best of 2019 - Why You Need a Cloud Platform to Succeed with Big Data Recorded: Jan 21 2020 53 mins
    Matheen Raza and Sandeep Dabade from Qubole
    As the volume, variety, and velocity of data increases, the cloud is the most efficient and cost-effective option for machine learning and advanced analytics. Organizations looking to scale their big data projects can do so with greater ease with a cloud-native data platform.

    Qubole provides a single platform for data engineers, analysts, and scientists that supports multiple use cases -- from machine learning to predictive analytics. The platform saves organizations up to 50 percent in data processing costs by leveraging multiple engines like Apache Spark, Presto, and Hive, and automatically provisions, manages, and optimizes cloud resources.

    ​Join experts from Qubole as they demonstrate how to get the most out of your data on the cloud. In this webinar, you'll learn:

    - The benefits of a single platform and centralized access to data
    - How to pick the right data processing engines and tools
    - To save money with intelligent cluster management and financial governance
    - Key considerations to evaluate cloud data platforms
  • Best of 2019 - Succeeding with a Cloud Data Lake - from Architecture to Ops Recorded: Jan 14 2020 44 mins
    Rangasayee Chandrasekaran and Akil Murali from Qubole
    To be successful, data lakes must evolve to support the ever-growing needs of organizations for real-time data; new exploration, discovery and analysis; or batch and streaming data pipelines. Whether you’re thinking about complementing your data warehouse with a data lake, moving your on-premises data lake to the cloud, or if you’re already operating a cloud data lake, this webinar is a must-attend.

    We’ll share key lessons learned in the last 18 months working with companies like Gannett, Nextdoor, Expedia, Zillow and others, which are running cloud data lakes at massive scale and delivering remarkable returns.We will also share best practices for building a cloud data lake operation, from people and tools to processes.

    In this webinar, we’ll cover:
    - Benefits of building a data lake in the cloud
    - How to set the foundation for your data lake, including storage, access, metadata, and more
    - Best practices for governing your data lake (privacy, security, financial governance)
    - Tools required for managing and processing data in your data lake
  • Best of 2019 - Leveraging Streaming and Batch Data Sets for ML Applications Recorded: Jan 7 2020 31 mins
    Jorge Villamariona and Ojas Mulay from Qubole
    Data Engineering is fast emerging as the most critical function in Analytics and Machine Learning programs. The ability to build and manage data pipelines for streaming and batch data sets are critical for the downstream success of your ML applications.

    In this webinar, you will learn how to use Qubole’s cloud-native platform to acquire and transform data sets for data science and analytics, make data sets available to different users, and fully leverage your data lake throughout your organization. Our experts will also walk through a real-world example of how to use Apache Spark and Airflow, as well as Notebooks, to build an end-to-end solution.

    Attendees will learn how to:

    + Ingest data to/from a cloud storage data lake
    + Perform interactive data analysis and build AI/ML models
    + Transform data sets with Spark and build interactive dashboards
    + Seamlessly interact with other data sources
    + Deploy end-to-end data pipeline using Apache Airflow
  • 'Best of 2019' (AMERS) - Mastering Data Governance on Cloud Data Lakes Recorded: Dec 18 2019 53 mins
    Dhiraj Sehgal, Director of Product Marketing & Akil Murali, Director of Product Management, Security and Governance at Qubole
    As more organizations run ETL workloads, analytics, and machine learning on data residing in data lakes, there are inherent privacy and integrity risks that must be addressed. How then, should organizations preserve privacy and control access to this data as per regulations such as GDPR and CCPA.

    While most organizations have put some measures for data governance in data lakes, current high-level file-level security measures and accepted best practices are not sufficient for data privacy and integrity requirements.

    In this webinar, Qubole data privacy and integrity experts will cover:

    - Maintaining data integrity and keeping sensitive information safe irrespective of open-source engine
    - Providing granular data access controls and the ability to mask data with Apache Ranger
    - Avoiding lost updates, dirty reads, stale reads and enforcing app-specific integrity constraints
    - Complying with “right to be forgotten” and “right to be erased” by ensuring that data in the data lake is current and deleted when necessary
    - Live Q&A with our hosts
  • Best of 2019 - Mastering Data Governance on Cloud Data Lakes Recorded: Dec 17 2019 50 mins
    Dhiraj Sehgal, Director of Product Marketing & Akil Murali, Director of Product Management, Security and Governance at Qubole
    As more organizations run ETL workloads, analytics, and machine learning on data residing in data lakes, there are inherent privacy and integrity risks that must be addressed. How then, should organizations preserve privacy and control access to this data as per regulations such as GDPR and CCPA.

    While most organizations have put some measures for data governance in data lakes, current high-level file-level security measures and accepted best practices are not sufficient for data privacy and integrity requirements.

    In this webinar, Qubole data privacy and integrity experts will cover:

    - Maintaining data integrity and keeping sensitive information safe irrespective of open-source engine
    - Providing granular data access controls and the ability to mask data with Apache Ranger
    - Avoiding lost updates, dirty reads, stale reads and enforcing app-specific integrity constraints
    - Complying with “right to be forgotten” and “right to be erased” by ensuring that data in the data lake is current and deleted when necessary
    - A demo of Qubole’s built-in Apache Ranger and ACID support for data privacy and integrity
  • Best of 2019 - Enterprise-Scale Big Data Analytics on Google Cloud Platform Recorded: Dec 10 2019 56 mins
    Naveen Punjabi from Google & Anita Thomas from Qubole
    As companies scale their data infrastructure on Google Cloud, they need a self-service data platform with integrated tools that enables easier, more collaborative processing of big data workloads.

    Join Qubole and Google experts to learn:

    - Why a unified experience with native notebooks, a command workbench, and integrated Apache Airflow are a must for enabling data engineers and data scientists to collaborate using the tools, languages, and engines they are familiar with.

    - The importance of enhanced versions of Apache Spark, Hadoop, Hive and Airflow, along with dedicated support and specialized engineering teams by engine, for your big data analytics projects.

    - How workload-aware autoscaling, aggressive downscaling, intelligent Preemptible VM support, and other administration capabilities are critical for proper scalability and reduced TCO.

    - How you can deliver day-1 self-service access to process the data in your GCP data lake or BigQuery data warehouse, with enterprise-grade security.
  • Mastering Data Governance on Cloud Data Lakes with Multiple Engines Recorded: Nov 20 2019 51 mins
    Dhiraj Sehgal, Director of Product Marketing & Akil Murali, Director of Product Management, Security and Governance at Qubole
    As more organizations run ETL workloads, analytics, and machine learning on data residing in data lakes, there are inherent privacy and integrity risks that must be addressed. How then, should organizations preserve privacy and control access to this data as per regulations such as GDPR and CCPA.

    While most organizations have put some measures for data governance in data lakes, current high-level file-level security measures and accepted best practices are not sufficient for data privacy and integrity requirements.

    In this webinar, Qubole data privacy and integrity experts will cover:

    - Maintaining data integrity and keeping sensitive information safe irrespective of open-source engine
    - Providing granular data access controls and the ability to mask data with Apache Ranger
    - Avoiding lost updates, dirty reads, stale reads and enforcing app-specific integrity constraints
    - Complying with “right to be forgotten” and “right to be erased” by ensuring that data in the data lake is current and deleted when necessary
    - A demo of Qubole’s built-in Apache Ranger and ACID support for data privacy and integrity
  • Succeeding with a Cloud Data Lake - from Architecture to Operations Recorded: Nov 7 2019 45 mins
    Rangasayee Chandrasekaran and Akil Murali from Qubole
    To be successful, data lakes must evolve to support the ever-growing needs of organizations for real-time data; new exploration, discovery and analysis; or batch and streaming data pipelines. Whether you’re thinking about complementing your data warehouse with a data lake, moving your on-premises data lake to the cloud, or if you’re already operating a cloud data lake, this webinar is a must-attend.

    We’ll share key lessons learned in the last 18 months working with companies like Gannett, Nextdoor, Expedia, Zillow and others, which are running cloud data lakes at massive scale and delivering remarkable returns.We will also share best practices for building a cloud data lake operation, from people and tools to processes.

    In this webinar, we’ll cover:
    - Benefits of building a data lake in the cloud
    - How to set the foundation for your data lake, including storage, access, metadata, and more
    - Best practices for governing your data lake (privacy, security, financial governance)
    - Tools required for managing and processing data in your data lake
  • Best Practices: How To Build Scalable Data Pipelines for Machine Learning Recorded: Oct 10 2019 41 mins
    Jorge Villamariona and Pradeep Reddy, Qubole
    Data engineers today serve a wider audience than just a few years ago. Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data engineers is the need to build and fill Data Lakes as well as reliably delivering complete large-volume data sets so that data scientists can train more accurate models.

    Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Qubole addresses these challenges by providing an auto-scaling cloud-native platform to build and run these data pipelines.

    In this webinar we will cover:
    - Some of the typical challenges faced by data engineers when building pipelines for machine learning.
    - Typical uses of the various Qubole engines to address these challenges.
    - Real-world customer examples
  • Key Differences Between On-Prem and Cloud Data Platforms Recorded: Oct 3 2019 47 mins
    Purvang Parikh, Qubole
    Cloud service models have become the new norm for enterprise deployments in almost every category — and big data is no exception. The separation of storage and compute in the cloud afford unparalleled scale, efficiency, and economics compared to on-premise solutions.

    If you are using Cloudera, Hortonworks or MapR, you should attend this webinar to learn the key differences between on-premise and cloud solutions, considerations for selecting cloud data lakes and data warehouses, and how to build the right architecture for your organizations analytics and machine learning needs.

    In this webinar, we’ll cover:

    - Difference between hosting an on-premise data platform in the cloud versus adopting a cloud-native architecture for data processing in the cloud
    - How a cloud data lake architecture differs from cloud data warehouses
    - How to move your data to the cloud and leverage big data engines like Apache Spark, Presto, Hive and more
    - Avoiding security and cost pitfalls that can derail your migration to the cloud
    - Demo of Qubole’s cloud-native platform
  • Right Tool for the Job: Using Qubole Presto for Interactive and Ad-Hoc Queries Recorded: Oct 3 2019 57 mins
    Goden Yao, Product Manager at Qubole
    Presto is the go-to query engine of Qubole customers for interactive and reporting use cases due to its excellent performance and ability to join unstructured and structured data in seconds. Many Qubole customers use Presto along with their favorite BI tools, such as PowerBI, Looker and Tableau to explore data and run queries.

    Two key criteria to look for in a query engine for interactive analytics are performance and cost. You want best-in-class performance to meet the short deadlines of interactive workloads, while reducing and/or controlling costs.

    Qubole Presto is a cloud-optimized version of open source Presto, with enhancements that improve performance, reliability and cost. In this webinar, we’ll cover:

    - When to use Presto versus other engines like Apache Spark
    - How to enable self-service access to your data lake
    - The key advantages of Qubole Presto over Open Source Presto
    - Live demo of running interactive and ad hoc queries using Qubole Presto
    - How customers like iBotta, Tivo and Return Path leverage Qubole Presto
  • Right Tool for the Job: Running Apache Spark at Scale in the Cloud Recorded: Sep 26 2019 49 mins
    Ashwin Chandra Putta, Sr. Product Manager at Qubole
    Apache Spark is powerful open source engine used for processing complex, memory-intensive workloads. However, running Apache Spark in the cloud can be complex and challenging. Qubole has re-engineered Apache Spark, optimising its performance and efficiency while reducing any administrative overheads. Today, Qubole runs some of the world’s largest Apache Spark clusters in the cloud.

    In this webinar, we’ll take a deeper look at the use cases for Apache Spark, including ETL and machine learning, and compare Apache Spark on Qubole versus Open Source Apache Spark. We’ll cover:

    - Why Apache Spark is essential for big data processing
    - How to deploy Spark at scale in the cloud and enable all data users
    - The enhancements made to Qubole Spark
    - A live demo and real-world examples of Apache Spark on Qubole
  • Leveraging Streaming and Batch Data Sets for ML Applications Recorded: Sep 25 2019 32 mins
    Jorge Villamariona and Ojas Mulay from Qubole
    Data Engineering is fast emerging as the most critical function in Analytics and Machine Learning programs. The ability to build and manage data pipelines for streaming and batch data sets are critical for the downstream success of your ML applications.

    In this webinar, you will learn how to use Qubole’s cloud-native platform to acquire and transform data sets for data science and analytics, make data sets available to different users, and fully leverage your data lake throughout your organization. Our experts will also walk through a real-world example of how to use Apache Spark and Airflow, as well as Notebooks, to build an end-to-end solution.

    Attendees will learn how to:

    + Ingest data to/from a cloud storage data lake
    + Perform interactive data analysis and build AI/ML models
    + Transform data sets with Spark and build interactive dashboards
    + Seamlessly interact with other data sources
    + Deploy end-to-end data pipeline using Apache Airflow
  • Mastering Data Discovery on Cloud Data Lakes Recorded: Sep 19 2019 44 mins
    Rangasayee Chandrasekaran, Product Manager, Qubole
    In order to capture and analyze new and different types of data, corporations are augmenting their data warehouses and data marts with cloud data lakes. Certainly, capturing new and different types of data is important, but providing access to all users, providing tools that allow them to work the way they already do, and deriving value from those datasets remains the ultimate goal.

    In this webinar, we will outline data processing challenges faced by analysts in the enterprise and a live demo of Qubole's Workbench—a powerful user interface that reduces time-to-insight by extending Qubole's multi-engine capabilities to data analysts and data scientists. Workbench enables data discovery combining unstructured, semi-structured, and structured data in data lakes or data warehouses for analytics, machine learning, or processing with engines such as Apache Spark.

    Attendees will learn:
    -- Common data processing challenges for analytics
    -- The value of data lakes
    -- Best practices for working with structured and semi-structured datasets
    -- When to use Apache Spark, Presto and other engines
The Open Cloud Data Lake Platform
Tune in to hear from open data lake platform leaders and engineers discuss everything from continuous date engineering on data lakes for machine learning, streaming analytics, ad-hoc analytics and data exploration in the cloud.

The interactive talks are designed for both data engineers, data analysts and data scientists that want to learn about some of the challenges and solutions for use cases seen in data-driven organizations.

Learn more about Qubole: http://bit.ly/AboutQubole

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Best of 2019 - Leveraging Streaming and Batch Data Sets for ML Applications
  • Live at: Jan 7 2020 10:00 am
  • Presented by: Jorge Villamariona and Ojas Mulay from Qubole
  • From:
Your email has been sent.
or close