Hi [[ session.user.profile.firstName ]]

How to Choose the Right Cluster for Your Workload - A Cluster Comparison

There are several clusters available in the cloud. In this webinar, Alex Aidun will walk through the associated use cases for each cluster configuration and help you go about selecting the right cluster. Then stay tuned as he walks you through the process of launching your own clusters.

What you will learn:
- How Hadoop, Presto, and Spark Clusters differ
- What are the decision points to consider when selecting a cluster
- How to get started launching clusters
Recorded Sep 13 2017 22 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Alex Audin, Director of Education Services, Qubole
Presentation preview: How to Choose the Right Cluster for Your Workload - A Cluster Comparison

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Spark Structured Streaming on the Cloud: Introduction to Internals Recorded: Feb 7 2018 30 mins
    Vikram Agrawal
    Register now to see the on-demand recording of this webinar.

    Apache Spark has been gaining steam, with rapidity, both in the headlines and in real-world adoption. Spark was developed in 2009, and open sourced in 2010. Since then, it has grown to become one of the largest open source communities in big data with over 200 contributors from more than 50 organizations. This open source analytics engine stands out for its ability to process large volumes of data significantly faster than contemporaries such as MapReduce, primarily owing to in-memory storage of data on its own processing framework. That being said, one of the top real-world industry use cases for Apache Spark is its ability to process ‘streaming data‘.

    With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time, and Spark Streaming has the capability to handle this extra workload. Some experts even theorize that Spark could become the go-to platform for stream-computing applications, no matter the type. The reason for this claim is that Spark Streaming unifies disparate data processing capabilities, allowing developers to use a single framework to accommodate all their processing needs. Among the general ways that Spark Streaming is being used by businesses today are Streaming ETL, Data Enrichment, Trigger Event Detection and Complex Session Analysis. In this webinar, we will cover an introduction, internals and industry use cases of ‘Structured Streaming in Spark’.

    Key Takeaways:
    - Understanding of Data Processing Architecture
    - Why and When to use Spark’s Structured Streaming
    - Spark’s Structured Streaming Programming Paradigm
    - Internals of Spark’s Structured Streaming
    - Spark Structured Streaming in the Real World – examples of how customers of Qubole use it
  • What's Ahead in Big Data and Analytics Recorded: Dec 12 2017 61 mins
    Paul Nelson, Leena Joshi, and Balaji Mohanam
    We have come a long way since the term "Big Data" swept the business world off its feet as the next frontier for innovation, competition and productivity. Hadoop, NoSQL and Spark have become members of the enterprise IT landscape, data lakes have evolved as a real strategy and migration to the cloud has accelerated across service and deployment models.

    On the road ahead, the demand for real-time analytics will continue to skyrocket alongside growth in IoT, machine learning, and cognitive applications. Meeting the speed and scalability requirements of these types of workloads requires more flexible and efficient data management processes – both on-premises and in the cloud. Flexible deployment and integration options will become a must-have for projects.

    Finally, the need for data governance and security is intensifying as businesses adopt new approaches to expand their data storage and access via data lakes and self-service analytics programs. As data, along with its sources and users, continues to proliferate, so do the risks and responsibilities of ensuring its quality and protection.

    Join us to watch the replay of "What's Ahead in Big Data and Analytics" to get real direction and practical advice on the challenges and opportunities to tackle in 2018.
  • Power your Big Data Infrastructure with Data Intelligence for Analytics and Data Recorded: Nov 15 2017 46 mins
    Balaji Mohanam, Senior Product Manager, Qubole
    Discover the newly launched features in Qubole, powered by Data Intelligence, that automates mundane Data Model performance appraisal and simplifies Data Ops. This session will provide a detailed walkthrough of Qubole’s latest offering in Data Intelligence that includes Data Model insights and Recommendations including Partitioning, Formatting, and Sorting that helps optimize data models for improved performance and computing resources. In addition, learn about Qubole’s latest offering in self-service analytics and how it can improve analysts productivity by making data discovery easy through column and table name auto-suggestion and completion, and insights preview.
  • Fireside Chat: Lessons Learned from Facebook Recorded: Sep 22 2017 48 mins
    Ashish Thusoo, CEO/Co-Founder, Qubole & Horia Margarit, Resident Data Scientist, Qubole
    In the final session of Data Platforms Online 2017, Ashish Thusoo will offer some of his highlights from the week’s sessions, pick out some emerging themes and trends, and answer questions from the audience. Ashish built the original data team at Facebook, is a co-author of Apache Hive, co-author of “Building a Data Driven Enterprise with DataOps” and CEO of Qubole. He’ll be moderated by Horia Margarit, resident Data Scientist at Qubole. Get your questions ready for what will be a lively and entertaining discussion!
  • Big Data Trends & Oracle Cloud Infrastructure Recorded: Sep 22 2017 36 mins
    Andrew Reichman, Sr. Director of Cloud Strategy, Oracle
    Cloud has changed the game when it comes to data analytics. Previously, organizations had to lock themselves into a particular architecture and level of capacity for three to seven years and do all the lifting themselves. Cloud on the other hand allows them to experiment with different hardware and software options, get more of the solution as a service and scale up and down to meet project spikes and accelerate busy jobs at will. This makes it much more viable for any company to get the advantages of advanced analytics against large data sets, without an oversize IT staff or huge capital investments.

    Oracle cloud is specifically designed to help enterprises take advantage of cloud for data analytics—it offers massive non-variable performance, predictable low cost and broad choice of deployment and software options. Oracle and Qubole work together to deliver a new breed of data platform—capable of taming the scale, performance, cost and complexity issues associated with gaining business insight from data of all types.

    Watch this webinar to understand:
    - Summary of industry trends for big data on the cloud
    - How Oracle Cloud Infrastructure is optimized for big data workloads from a cost, performance and flexibility perspective
    - How Oracle Cloud Big Data solutions compare with on-premises and competing cloud options
  • Untangling the Cloud Services Hairball Recorded: Sep 22 2017 25 mins
    James Curtis, Senior Analyst - Data Platforms & Analytics, 451 Research
    The question is not much whether to migrate to the cloud or not. That question has likely already been answered by many organizations and the answer is a resounding full steam ahead. But the start of the journey can be daunting especially with a lot of ‘as-a-service’ terminology floating around. Please join James Curtis, senior analyst at 451 Research, as he discusses not only some industry trends and what many organizations are doing but also a simplified approach to understanding cloud services and how that might best fit your organization. Because it’s not so much buyer beware; it’s more about buyer understand.
  • BS-free Data Science Recorded: Sep 22 2017 45 mins
    Aman Naimat, Senior Vice President, Technology, Demandbase
    There is a surge in hype around Artificial Intelligence. Startups are raising hundreds of millions of dollars by bedazzling investors with Deep Learning, word embeddings, and reinforcement learning. This is a distraction from the very real problems that data and AI can solve if done right. By working across dozens of machine learning problems that are live in the real world, I’ve worked out the most common problems encountered and recurring design patterns on how to solve real-world problems using AI as a tool. This talk will arm you with a perspective on how to get pragmatic solutions with AI today.
  • Get "Datopia": Transforming to an Agile Data Culture Recorded: Sep 22 2017 35 mins
    Tripp Smith, Chief Technical Officer, Clarity
    This webinar is focused on:
    - Increasing collaborative friction between engineers, analysts, and business
    - Process-driven iteration i.e. balancing agility with discipline
    - Making the quantitative business case for moving from big bang to continuous enhancement (convincing your CFO/CIO to shift from CapEx to OpEx)
    - Case studies and outcomes from our clients
  • Modern Data Architecture with AWS Recorded: Sep 21 2017 29 mins
    Pratap Ramamurthy, Partner Solution Architect, Amazon Web Services
    Today’s organizations are tasked with managing multiple data types, coming from a wide variety of sources. Faced with massive volumes and heterogeneous types of data, organizations are finding that in order to deliver insights in a timely manner, they need a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Every use case might be different and different use cases might need different tools. AWS provides a variety of options for your needs from RDS, EMR, Redshift, Athena and Quicksight. In this talk we will discuss the different technologies available on AWS and its application.
  • Amplifying Retail with Big Data and The Cloud Recorded: Sep 21 2017 29 mins
    Carter Bradford, Senior Vice President, Precocity
    Once considered the "black magic" of digitally-born retailers like Amazon, personalizing the customer experience has now become table stakes for any retailer interested in surviving in the era of digital transformation. the techniques, tools and scalable platforms necessary to optimize customer interactions are now available and accessible for use by companies of any shape or size. We'll discuss how the use of big data technology in the cloud eases the implementation of common retail use cases as well as how it helps to avoid typical pitfalls.
  • Bringing DevOps to the World of Data Science Recorded: Sep 21 2017 38 mins
    Sridhar Alla, Big Data Architect, Comcast
    We will look at how DevOps is making Data Science more mainstream with automation, release trains, agility and operational readiness. In this talk, we will look at the various tools and techniques in building a successful Data Science practice and how DevOps can be introduced to provide Continuous Integration, Delivery, and Deployment of Data Science Models.
  • Real-time Analytics on Streaming Data with Azure Stream Analytics Recorded: Sep 21 2017 43 mins
    Krishna Mamidipaka, Sr. Program Manager, Microsoft Azure
    Continuous streams of data are generated in every industry from sensors, manufacturing IoT devices, business transactions, social media, network devices, clickstream logs, and more. Found within these streams of data are critical business insights that are waiting to be unlocked. Attend this session and learn how customers are creating solutions for fleet monitoring, smart grid, network monitoring, recommendations, and other real-time solutions to analyze multiple concurrent streams of data-in-motion into insights and actions for competitive advantage.

    In this session you will see demos and learn how services like Azure Event Hubs, Stream Analytics, Machine Learning, and other Azure services work seamlessly together to create your end to end real-time analytics solutions.
  • Azure Machine Learning and R to Speed-up Data Science Projects Recorded: Sep 21 2017 34 mins
    Scott Donohoo, Technology Solutions Professional, Microsoft, & Erik Zwiefel, Technology Solutions Professional, Microsoft
    This session will cover how Azure Machine Learning and R can help data scientists overcome the following challenges:
    - Development Time - Dramatically reduce the time of running initial ML experiment validations.
    - Performance - Option for best in class performance.

    For deeper data science needs the session will explore how hard core data scientists can leverage R to attack the most complex scenarios.
  • Data Governance, Discovery, & Lineage in a Heterogeneous Streaming Platform Recorded: Sep 21 2017 42 mins
    Barbara Eckman, Principal Data Architect, Comcast
    Data governance, discovery and lineage help data scientists find and integrate data of interest to uncover otherwise hidden trends, anomalies, and powerful predictors of business successes and failures. Comcast’s Streaming Data Platform comprises a wide variety of ingest, transformation, and storage services. Peer-reviewed Apache Avro schemas support end-to-end data governance. Apache Atlas is our metadata repository for data discovery and lineage. We have extended Atlas with custom data and process types, eg.: avro schemas; AWS S3 buckets and prefixes; kafka topics; and kinesis streams. Custom asynchronous messaging libraries notify Atlas of new data and schema entities and lineage links as they are created.
  • How We Built a Scalable, Fast, & Reliable Indexing Infrastructure Recorded: Sep 20 2017 29 mins
    Navin Agarwal, Principal Engineer, BloomReach
    At BloomReach we process around 100 million products everyday across all of our customers. For each customer, the feed processing needs to be fast and reliable, and while indexing there shouldn't be any impact on serving. We will walk over how we've built this in BloomReach while also making sure that the cost is minimal.
  • Creating Real Value: Bridging the Gap Between Analysis and Action Recorded: Sep 20 2017 30 mins
    Dillon Morrison, Platform Alliances Manager, Looker & Andrew Wynn, Product Manager, Looker
    Looker is the modern data platform that democratizes data analytics, creates meaningful insights, and powers critical business actions. The Looker platform allows you to analyze your data and act on it within a single interface, enabling both business users and analysts to add maximum value where it matters most. Stop jumping between tabs and tools - do it all in Looker.
  • Accelerate Time to Value with Data Operations Recorded: Sep 20 2017 44 mins
    Saket Saurabh, Co-Founder & CEO, Nexla
    Today, 91% of companies are ingesting data from third party partners to run their businesses. Additionally, 70% of companies either currently send or plan to send data to partners. This inter-company data collaboration powers insights, machine learning, and better consumer experiences. But, it also increases workloads for strapped engineering teams and creates challenges to data access. Learn how companies are streamlining and even automating their Data Operations to accelerate the time from data to business value.
  • On-Demand Analytics: Building Big Data Solutions with Azure Data Lake Recorded: Sep 20 2017 24 mins
    Cathy Palmer, Ph.D., Principal Program Manager, Microsoft
    Enterprises are building big data solutions with Azure Data Lake, an on-demand, real-time stream processing service with a no-limits data lake built to support massively parallel analytics. Patterns of enterprise solutions are emerging and evolving as customers migrate their analytics workloads to the cloud and embrace new business opportunities. With an overview of Azure Data Lake, this webinar briefly explores some of the choices customers are making in building big data solutions with Azure Data Lake.
  • Qubole and Talend: A Match Made in the Cloud Recorded: Sep 20 2017 24 mins
    Shawn James, Sr. Director Technology Alliances, Talend
    Talend provides the data agility businesses need to use the latest cloud technologies to act with insight across their organization and win in an economy being deeply transformed by exploding data volumes, technology innovation, and fundamental changes to the IT infrastructure. Join us to learn how Talend and Qubole together help companies’ business users execute data preparation workloads in the cloud at a fraction of the cost and resources.
  • Deep Learning for Biotechnology on Qubole Recorded: Sep 20 2017 38 mins
    Matt Der, Chief Technology Officer, Notch.io
    In the biological sciences, hypothesis-driven experiments and bottom-up design experiments rely on predicting what will happen with new cells and molecules. Machine learning excels at prediction and has become more democratized, making it an important component in the biotech toolkit. We use Merck's Kaggle competition as a representative task in this domain that involves predicting molecular activity from numeric descriptors of chemical structure. Our approach utilizes deep neural networks using the Keras library in a Qubole notebook, which is conveniently attached to an autoscaled Spark cluster. We use Spark to distribute the hyperparameter search for optimizing the neural net.
Elemental to Big Data
At our core, we are a team of engineers who live, eat, and sleep big data. We believe that ubiquitous access to information is the key to unlocking a company's success. To achieve this, a big data platform must be agile, flexible, scalable, and proactive to anticipate a company's needs.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: How to Choose the Right Cluster for Your Workload - A Cluster Comparison
  • Live at: Sep 13 2017 6:00 pm
  • Presented by: Alex Audin, Director of Education Services, Qubole
  • From:
Your email has been sent.
or close